Privacy-Preserving DDoS Attack Detection Using Cross-Domain Traffic in Software Defined Networks

09/19/2018 ∙ by Liehuang Zhu, et al. ∙ IEEE 0

Existing distributed denial-of-service attack detection in software defined networks (SDNs) typically perform detection in a single domain. In reality, abnormal traffic usually affects multiple network domains. Thus, a cross-domain attack detection has been proposed to improve detection performance. However, when participating in detection, the domain of each SDN needs to provide a large amount of real traffic data, from which private information may be leaked. Existing multiparty privacy protection schemes often achieve privacy guarantees by sacrificing accuracy or increasing the time cost. Achieving both high accuracy and reasonable time consumption is a challenging task. In this paper, we propose Predis, which is a privacypreserving cross-domain attack detection scheme for SDNs. Predis combines perturbation encryption and data encryption to protect privacy and employs a computationally simple and efficient algorithm k-Nearest Neighbors (kNN) as its detection algorithm. We also improve kNN to achieve better efficiency. Via theoretical analysis and extensive simulations, we demonstrate that Predis is capable of achieving efficient and accurate attack detection while securing sensitive information of each domain.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Software-Defined Networks (SDNs) have emerged as a new networking paradigm, which is liberated from the vertical integration in traditional networks and gives the program and the network their flexibility through a centralized logical network controller [1]. SDNs consists of the data plane, the control plane and the application plane. The control plane contains some controllers that run the control logic strategy and maintains the entire network view as a logic-centric. The controllers abstract the whole network view into network services and provide the easy-to-use interface for operators, researchers or third parties to facilitate these personnel to customize the privatization applications and realize the logical management of the network. Users of the SDNs don’t need to worry about the technical details of the underlying device, just a simple programming can realize the rapid deployment of new applications.

SDNs simplify the network management and adapt better to the current situation in which the network size continues to expand rapidly.

However the features of the centralized control and programming make SDNs susceptible to the well-known Distributed Denial-of-Service (DDoS) attacks. For instance, the controller, which plays a crucial role in determining the functionality of each component in SDNs, is a main target of DDoS attacks [1]. A compromised controller would result in paralyzation or misbehaving of all switches under its control.

Denial of service (DoS) attack can run out of the resources of a system on the target computer, stop services and leave its normal users inaccessible. When hackers use two or more compromised computers on the network as ”puppet machine” to launch DoS attacks on a specific target, it is referred to as DDoS attacks. The puppet machine’s IP address and the structure and form of attack packet is random, making it difficult to trace the the attacker. DDoS attacks have become a severe threat to today’s Internet and the attacks make online services unavailable by overwhelming victims with traffic from multiple attackers. With the number of businesses migrating their operations online growing dramatically, DDoS attacks can lead to significant financial losses. A recent report reveals that DDoS attacks account for 22% of the 2015 data center downtime [2].

DDoS attacks essentially operate in three steps [3], i.e., scanning, intrusion, and attack launching. Abnormal traffic of DDoS attacks usually affects multiple paths and network domains (e.g., SDNs domains). For ease of illustration, we analyze the stages of typical DDoS attacks using the datasets collected by the Lincoln Laboratory of MIT [4], LLDOS 1.0, as shown in Table 1. Prior to the launch of the attack, abnormal traffic can be observed at the stages of scanning and intrusion. If the victim and the puppet machines under DDoS attacks were located in different network domains, a detection attempt restricted within a single domain would be unable to identify the attacks at their primary stages. Thus, the involvement of multiple domains in attacks detection will help to achieve more accurate and timely detections [5, 6].

The Number of Flows
Stage 172.16.112.* 172.16.113.* 172.16.114.* 172.16.115.* Victim Attacker Merge
Scanning 424 328 32 296 0 1081 1081
Intrusion 128 0 97 0 0 332 335
Attacking 2 0 0 285 107465 199 107667
Table 1: Analysis of LLDOS Dataset

In the SDNs environment, the collaborative detection across multiple domains requires detailed traffic data of each domain involved, such as the content of the flow table in the latest seconds. However this may cause serious privacy concerns on the SDNs operators, as the traffic data reveals sensitive information, such as source IP addresses, destination IP addresses, and traffic statistics [7], which have potential utility in mining network topology and network connection behaviors [8, 9, 10]; Accordingly, SDNs operators are reluctant to share their detailed intra-domain traffic data with each other. Therefore, trade-offs between attacks detection efficiency and privacy protection should be carefully balanced in SDNs.

Many schemes of DDoS attacks detection in traditional networks have been proposed and have showed satisfying results [11, 12, 13, 14, 15, 16, 17]

. Extensive studies on DDoS attacks in SDNs have also been done and many traffic classifiers with excellent performance have been proposed (e.g., graphic model

[18], the entropy variation of the destination IP address [19]

, Support Vector Machine (SVM)

[20]). While these DDoS attacks detection schemes are usually restricted to a single domain [21], very few studies have considered cross-domain attacks detection. Bian et al. [22]

proposed a scheme for cross-domain DDoS attacks detection in SDNs using Self-organizing Map (SOM) as the traffic classifier. The calculation in the training and the test phase is very complicated when SOM as the classifier which requires multiple vector multiplications, or more complex divition. Secure Multi-party Computation (SMC)

[23, 24]

may enable secure cross-domain anomaly detection (e.g., secure addition protocol, secure multiplication protocol and secure compare protocol). However, these protocols require a large amount of interaction from the participants and calculations on ciphertext, which undoubtedly consumes many of the controller’s bandwidth.

Cross-domain attacks detection will lead to privacy leakage, whereas the introduction of privacy protection usually comes at a cost of excessive time consumption and low detection rate. We should address these challenges when detecting DDoS attacks in cross-domain. The first challenge is how to conduct cross-domain DDoS attacks detection in SDNs without revealing privacy of each network domain. Anomaly detection classifiers require detailed traffic data, and SDNs domains do not trust each other. It is necessary that we work out the privacy issue when multiple SDNs domains work together to perform anomaly detection. The second challenge is to ensure efficient and accurate DDoS attacks detection while well-preserving privacy. Strong privacy protection in multi-party cooperation is often at the cost of accuracy and high time-consumption, and it is hard to give priority to one or the other. In the face of these dilemmas, we resort to decoupling the detection into two steps, disturbance and detection and, introducing two servers that work in collaboration to complete the detection process.

We combine digital cryptography with perturbation encryption to address the first challenge. Transport Layer Security (TLS) is used to protect the privacy in data transmission process between two servers and SDNs controllers, and perturbation encryption to protect that privacy is not compromised when calculating on the two servers. The ciphertext produced by perturbation encryption, since the special design of Predis, can be compute directly in servers without the need to use complex security computing protocols.

With respect to the second challenge, we use the features of k-Nearest Neighbor (kNN) of calculate simple, decoupling the kNN algorithm into two steps and embedding the encryption steps into it. After giving the training data, kNN can classify the test samples by choosing a distance measurement formula without a training phase (Euclidean distance111The formula of European distance of n-dimensional vector is . is selected in Predis

). The kNN’s features of easy to implement is very useful for us when we want to embed some special operations into the classifier to protect the privacy of test data. On the other hand, kNN, as a classification algorithm, has excellent performance in accuracy, and kNN is not sensitive to outliers which can maintain high accuracy when there are some noises in dataset. kNN as the classifier with high accuracy has been widely used in many different areas

[25, 26, 27].

Our contributions are summarized as follows:

  1. We propose Predis, a privacy-preserving cross-domain DDoS attacks detection scheme for SDNs, which considers both the cross-domain DDoS attacks detection and the privacy protection in multi-party cooperation. Predis uses the features of SDNs and the improved kNN algorithm to detect DDoS attacks accurately within effective time, and combines the digital cryptography with perturbation encryption to provide the confidentiality and every participant’s privacy.

  2. We prove the security of Predis by the asymptotic approach of computational security in modern cryptography. Through rigorous security analysis, we prove that the traffic data provided by each participant is indistinguishable for potential adversary.

  3. We conduct extensive experiments using multiple authoritative datasets to demonstrate the timeliness and accuracy of Predis. We show that our scheme not only can determine if the traffic is abnormal, but also can find abnormal traffic at the early stages of DDoS attacks. Results show that Predis is more accurate than existing detection schemes, meanwhile protecting participants’ privacy.

The rest of the paper is organized as follows. We review the related work in Section 2 and introduce the thread model and security goals in Section 3. We introduce the improved kNN algorithm in Section 4 and describe the design of Predis as well as the concrete calculation steps and encryption details in Section 5. We present the security analysis in Section 6 and the experimental results in Section 7. We conclude this paper in Section 9.

2 Related Work

2.1 Background of DDoS Attacks

DDoS attackers can simultaneously control several computers and create an attack architecture that contains control puppets and attack puppets [3], as shown in Figure 1. The traditional attack architecture is analogous to a dumbbell-shaped structure, where an intermediate network is only responsible for data forwarding and security events and control functions are entirely done by management, while the network itself does not have the ability to detect and deal with network attacks.

Figure 1: A typical structure of DDoS Attacks

2.2 Summary of DDoS Attacks Detection Methods

There are numerous studies on DDoS attacks detection because of the severity and prevalence of DDoS attacks. Here we briefly summarize the related work from two perspectives, i.e., DDoS attacks detection in conventional networks and DDoS attacks detection in SDNs, as listed in Table 2.

Detection in conventional networks. Detection approaches of DDoS attacks have been studied extensively in conventional networks, where methods such as Entropy based [12], SVM [13], Naive Bayesian [15]

, Neural Network

[16]

, cluster analysis

[17], Artificial Neural Network (ANN) [14], and kNN [11] are used as classifiers.

Detection in SDNs.

The SDNs controller collects information of flow table and uses selected classifiers to classify network traffic flows as either normal or abnormal. Based on the capability of logical centralized controller and programmability of the network, network administrators can respond to the attacks immediately. Classic classification methods of Bayesian Networks

[28] and SVM [20], as well as neural networks of SOM [29, 22, 30]

and Deep Learning

[31] are used as traffic classifiers in SDNs.

Network Environment Classifiers Reference
Conventional Network Entropy based David et al. [12]
Support Vector Machine Yusof et al. [13]
Naive Bayesian Singh et al. [15]
Neural network Hsieh et al. [16]
Cluster analysis Wei et al. [17]
Artificial Neural Network Saied et al. [14]
k-Nearest Neighbor Thwe et al. [11]
SDNs Self-organizing map Braga et al. [30]
Xu et al. [29]
Bian et al. [22]
Support Vector Machine Kokila et al. [20]
Entropy variation of the destination IP address Mousavi et al.[19]
Deep Learning Niyaz et al. [31]
Bayesian Networks Nanda et al. [28]
Table 2: Summary of Existing DDoS Attacks Detection Methods

The methods proposed above except Bian et al. not considered cross-domain attacks detection, naturally they didn’t consider the privacy protection, and these methods require complex calculations like vector multiplications, or more complex vector division during the testing phase (e.g., the calculation formula of Naive Bayesian is 222x is the test instance, d is the dimensions of test instance and y is the class mark.). If these methods are conducted directly for cross-domain attacks detection, secure computation protocols will be needed to solve the privacy protection. Predis not only protects the privacy but also avoids the extensive interactions and calculations that are required when using secure computing protocols.

Bian et al. [22] considered both cross-domain DDoS attacks detection and privacy protection. They proposed a privacy-preserving cross-domain detection scheme, using SOM as classifier. But their method has major complications if it came to computations, i.e., the time complexity of training a neural network and test should be and , respectively, where

is the number of neurons and

is the number of training samples. The time complexity of our method in the testing phase is , and as a type of instance-based learning (or lazy learning), kNN don’t has training phase. In addition, they failed to consider detecting DDoS attacks at their early stages. We attempt to find anomaly at the early stages of DDoS attacks, because if we find anomaly before the stage of attacking, we can take countermeasures (e.g., by blocking ingress traffic with certain attack characteristics) to avoid further losses. However, this importance has hardly been realized by most of the prior studies. Mousavi et al. [19] proposed a method to detect DDoS attacks in SDNs, which claims that it can detect DDoS attacks within the first five-hundred packets of the attack traffic. Given that if puppet machines and the victim were in different SDNs domains, traffic will not be reflected as being abnormal.

2.3 Privacy-preserving in Cross-domain Detection

A SDNs domain in Predis refers to a controlled domain under SDNs architecture, which is a network domain with the deployment of the SDNs techniques and can be independently controlled by operators. The SDNs domains conduct centralized control of data forwarding. The multiple SDNs domains described in our article collaborate and these SDNs domains may or may not be adjacent at physical or geographical location. The control plane of SDNs domains centralized sends flow table to a specified location (i.e. computing server). The computing server provides the DDOS detection service and return detection results controllers. Traditional network domains for traffic forwarding is a distributed control, it can’t achieve centralized control.

Privacy-preserving cross-domain attacks detection can be see as a Secure Multi-Party Computation (SMC) problem [32], which is the matter of how to safely calculate a function when no credible third party is present. There are a hit research subject [33, 34, 35, 36] about it.

Chen et al. [37] present a cryptographic protocol specially devised for privacy-preserving cross-domain routing optimization in SDNs. But these methods do not apply to the problem of cross-domain attacks detection. Martin et al. [38] investigated the practical usefulness of solutions based on SMC, having designed optimized secure multiparty computation operations that ran efficiently on voluminous input data. Their method may provide a new insight into the problem in this paper, but their application scenarios are not exactly the same as ours.

Predis uses the kNN algorithm as the classifier to perform DDoS attacks detection. The kNN technique has been employed to solve privacy-preserving problems before, and there are already several eminent secure kNN protocols. Wong et al. [39] proposed ASPE, a protocol which preserved a special type of scalar product, and constructed two secure schemes that supported kNN computation on encrypted data. Elmehdwi et al. [40] set up SkNN, which provided better security in solving the kNN query problem over encrypted database outsourced to a cloud. Cao et al. [41] proposed MRSE, which defined and solved the problem of multi-keyword ranked search over encrypted cloud data. These secure kNN protocols focus on applying kNN to querying over encrypted data. Secure kNN does inspire us in some ways the problem to be dealt with is DDoS attacks detection, which demands a higher accuracy and an immediate response and is different to the querying problem over encrypted data.

Comparing with previous studies about DDoS attacks detection, Predis not only considers detecting DDoS attacks over multiple domains with privacy-preserving, but also attempts to detect DDoS attacks at the early stages.

3 System Model and Threat Model

In this section, we first describe the overview of the system model and the roles involved in Predis. Then, we present the thread model, followed by the security goals.

3.1 System Model

Predis mainly contains three types of roles: Computing Server (CS), Detection Server (DS) and SDNs domains333Hereafter we refer to domains as SDNs domains unless otherwise stated., as exhibited in Figure 2(a). Domain is the -th domain who participates in attacks detection and provides data to CS and DS, which, in turn, provide computing and encryption services for domain .

The system sequence diagram is shown in Figure 2(b). Each domain sends traffic information to CS for calculation and receives the detection results from DS. CS provides computing service and sends the intermediate results to DS, where the latter provides detection service based on the intermediate results and replies the detection results to each domain. Thus, CS and DS perform computation in collaboration with one another. The details of how computing and encryption work in CS and DS will be described later in Section 5.

(a) System Overview
(b) System Sequence Chart
Figure 2: Privacy-preserving Cross-domain Attacks Detection Scheme Schematic Diagram.

Predis provides accurate DDoS attacks detection service for domains, where each domain is unwilling to share privacy traffic information. Here, we give a formal definition of privacy as follows:

Definition 1

(Privacy). The information of flow table is provided by domains that participate in the detection. Specifically, privacy includes IP Source, IP Destination, Port Source, Port Destination, Length, and Flow Packets.

We define the basic operations in Predis of the three roles mentioned above as three functions with input and output. Each function is designed to run on continuous inputs in real time of data partitioned into a certain time interval. Predis has a set of input peers. Input peers want to jointly compute the final result of Predis on their private data without the slightest relevant disclosure. Predis has players called privacy peers that perform the computation of Predis by simulating a trusted third party (TTP) [38]. Domains are both input peers and privacy peers, while CS and DS are privacy peers.

3.2 Threat Model

We abstract the corss-domain privacy-preserving DDoS attacks detection problem with a threat model. In the thread model, there are two types of adversaries, namely the external adversary and the semi-honest adversary.

External adversary. Adversaries that through Internet eavesdropping or data interception and other means to illegally obtain the data in the transmission process for their purposes.

Semi-honest adversary. A curious participant who follows the protocol properly to fulfill service functions, but tries its best to infer sensitive or private information from intermediate results of calculation, or even colludes with other participants.

Privacy peers will set up a secure, confidential and authentic channel connecting each other to resist the external adversary. In Predis, we use TLS to build this secure channel. We adopt the semi-honest assumption for all privacy peers. Honest privacy peers follow the protocol and do not combine their information. Semi-honest privacy peers do follow the protocol but try to infer input peers’ privacy as much as possible from the values they learn, and also by combining their information. Domains are hoping to get the correct results of attacks detection. While following the right steps, some domains may try to infer other domains’ privacy for certain purpose. CS and DS will provide the right calculation service, but may use the intermediate results generated by intermediate steps in calculation to infer and spy privacy from domains.

We assume that all privacy peers have the potential to be external adversary through eavesdropping or other methods to illegally obtain input peers’ privacy. In addition to the roles included in this program, any other external eavesdropper is also the adversary we need to tackle.

3.3 Security Goal

The purpose of this paper is to obtain accurate cross-domain DDoS attacks detection results under the premise of privacy protection. Privacy peers may steal privacy as the external adversaries or the semi-honest adversaries. Furthermore, privacy peers may collude with each other. In our solution, we allow one or more domains to collude with each other, with CS, and with DS. Based on these, we make the following assumptions:

  1. Each domain performs function honestly but may have interest in the private information of other domains.

  2. CS or DS performs calculation process correctly but may have interest in obtaining domains’ private information.

  3. CS or DS may collude with one or more domains. Semi-honest privacy peers do follow the protocol but try to infer peers’ privacy as much as possible from the values they learn. Thus, CS or DS may collude with one or more domains.

  4. CS and DS do not collude with each other. In reality, DS and CS can be deployed by different operators. Operators are likely to have conflict of interests, so it is assumed that CS does not collude with DS.

Before describing our security goals, we introduce a security definition (i.e., Definition 2), an adversarial indistinguishability experiment as shown in Table 3, and a definition about negligible (i.e., Definition 3) for a probabilistic polynomial-time (PPT) adversary () [42].

Definition 2

(Indistinguishability). A private-key encryption scheme has indistinguishable encryptions under an attack, if for all PPT adversaries there is a negligible function such that

, where the probability is taken over the randomness used by

, as well as the randomness used in the experiment.

Definition 3

(Negligible). A function from the natural numbers to the nonnegative real numbers is negligible if for every positive polynomial there is an such that for all integers it holds that .

We aim at achieving the security objective of keeping privacy of each domain. We specify our security goals as follows:

  1. For CS and DS, input peers privacy is protected.

  2. For domains, other input peers’ privacy is protected.

The indistinguishability experiment :
1. A key is generated by running .
2. The adversary is given input , and outputs a pair of messages , of the same length.
3. A uniform bit is chosen, and then a ciphertext is computed and given to .
4. outputs a bit .
5. The output of the experiment is defined to be if , and otherwise. In the former case, we say that succeeds.
Table 3: Indistinguishability Experiment

4 Classification Method

To adapt to the proposed privacy protection scheme, we design a classifier by the kNN algorithm, decoupling it into two steps and embedding the encryption steps into it. In this section, we will introduce the details of how traffic classification is carried out.

4.1 Improved kNN as Classifier

In general, kNN is implemented by linear scanning [43]. In linear scanning, we need calculate every distance between the test data and training data, and than sort and find the nearest instances. When the training dataset is very large, the computation will be very time-consuming.

KD-tree is a balanced binary tree that divides the entire attribute space into specific parts according to the number of attributes of the dataset, and then carries out relevant query operations in a specific space. Best Bin First (BBF) is an optimization algorithm for querying on the KD-tree, the main idea of which is to sort the nodes in the ”querying path”, and the retroactive checking is always performed from the best-priority tree node. Using KD-Tree to store training dataset and searching with BBF not only don’t need to calculate every distance between test data and training data, but also improve the efficiency. Readers interested in KD-Tree or BBF can read literature [44], because these are not the focus of Predis, we don’t detail describe it here.

In Section 3, we have introduced the system model where CS has the training dataset, and DS provides detection service. So, we decoupled kNN into two steps: In the first step, CS builds a KD-tree based on the training dataset, and calculates the preliminary results of the distance between the test data and the ordered training data. The second step, DS gets the preliminary results and finds the nearest instances by using BBF. The time complexity of kNN with linear scan is , and with BBF the time complexity is . When the dataset is large, the time consumption shorten by BBF is very impressive. Main steps are shown in Algorithm 1.

1:Training datasets , test instance , timeout limit .
2:Detection result .
3:Building KD-tree based on the dimensions of training data in CS.
4:calculating the preliminary results from the ordered training data in CS.
5:Removing the perturbation from preliminary results to get the correct distance in DS.
6: as the root is added into traversal queue.
7:Initializing a queue .
8:while traversal queue is not null and  do
9:      traversal queue’ top.
10:     Get the between and distance.
11:     if  ’s top then
12:          remove ’s top.
13:          Insert to .      
14:     if ’s n-th dimension’s value ’s n-th dimension’s value then
15:          ’s left children enters traversal queue.
16:          traverse right subtree.
17:     else
18:          ’s right children enters traversal queue.
19:          traverse left subtree.      
20:Getting the detection result by queue .
21:return .
Algorithm 1 Improved kNN Algorithm

4.2 Feature Selection

The proposed DDoS attacks detection scheme is based on the flow table obtained from SDNs controllers. As there are a lot of redundant information inside, which affects not only the detection efficiency but also the results, we extract feature information from the flow table. Normal traffic is generally interactive because the purpose of it is to obtain or provide services, but the number of ports and source IP addresses will increase significantly when DDoS attacks occur. One of the other features of a DDoS attacks is source IP spoofing, which usually results in a lot of traffic with a small number of packets with a small number of bytes. Normal flow usually has many packets, and the number of flow’s bytes is larger. So we calculate the median of packets per flow and bytes per flow to reinforce this feature instead of the mean, because the mean is possible to smooth this feature.

To quantify these characteristics, five parameters are used in the feature selection module, including MPF, MBF, PCF, GOP and GSI, which are elaborated as follows:

  1. Median of Packets per Flow (MPF), which describes the number of packets’ median in every flows. We rank the flows in ascending order based on the number of packets per flow, and then compute the median value according to Formula (1).

    (1)
  2. Median of Bytes per Flow (MBF), which describes the number of bytes’ median in every flows. We rank the flows in ascending order based on the number of bytes per flow, and then compute the median value.

  3. Percentage of Correlative Flow (PCF), which describes the number of flows with interactive features in every flows. We define flow as and flow as , where and use the same protocol. , where is the number of in addition to the number of .

  4. Growth of Ports (GOP), which describes the growth rate of the number of ports within a fixed time. , where is the fixed time interval and is the number of port growth.

  5. Growth of Source IP Addresses (GSI), which describes the growth rate of the number of source IP addresses within one fixed time. , where is the number of source IP addresses growth.

5 Privacy-Preserving Cross-domain Attacks Detection Scheme

In this section, we describe the workflow of Predis and detail the processes of how to combine privacy protection in DDoS attacks detection.

5.1 Encryption in Data Transmission Process

To avoid traffic data being leaked in transmission process, we leverage the TLS [45].

TLS is a security protocol that provide secure connections between two applications to communicate across a network to exchange data and is designed to prevent eavesdropping and tampering. Before the application layer protocol communication, TLS protocol has completed the encryption algorithm, the communication key agreement and server authentication. Application layer protocol can be created transparently on the TLS protocol. TLS consists of three basic steps: The client asks and verifies the public key to the server; Both parties negotiate to generate session key; Both parties use the session key for encrypted communications. TLS for information transfer across a network is considered safe and reliable up to now.

1:Traffic data of flow table.
2:Ciphertext passed to DS , ciphertext passed to CS .
3:Initialize seven tuple set by feature selection module’s formula and traffic data of flow table.
4:while 1 do
5:     for  do
6:          for  do
7:               Random generation disturbance parameter .
8:               .                
9:     for  do
10:          .      
11:     for r do
12:          .      
13:     return to DS.
14:     return to CS.
Algorithm 2 Traffic Pretreatment in Each SDNs Domain

5.2 Traffic Pretreatment in Domains

In traffic pretreatment, domains need collect traffic and abstract each piece of traffic information as a seven tuple, followed by generating perturbation parameter respectively. The obtained seven tuple is encrypted by the perturbation parameter. Ultimately, domains transmit the encrypted seven tuple to CS, the perturbation parameter to DS, as shown in Algorithm 2.

Domains collect and transmit traffic information every 3 seconds, since an overly long interval would cause the network paralyzed before the attacks are detected, while an overly short one would make the resource utilization of detection module too high to handle other requests in the controller, which can cause heavy load on the link between the controller and its switches.

(a) SDNs Flow Table Content
(b) Traffic Pretreatment
Figure 3: The Format of SDNs Flow Table and Traffic Pretreatment.

As mentioned in Section 4.2, the information needed in detection are source IP, destination IP, source port, destination port, flow bytes, and flow packets. Domains go through the process written in the flow table with the equations described in Section 4.2 and calculate the MPF, MBF, PCF, GOP and GSI. We define a seven tuple as Serial Number, Time, MPF, MBF, PCF, GOP and GSI . The functions of Serial Number and Time are similar to the primary key in relational database. It is a label that uniquely identifies this flow table item generated by domains. In experiments we set the Serial Number as number for -th domain participating in detection, and Time as the timestamp of the flow table item. Each attribute in the seven tuple is stored as a binary of 33 bits (we add an additional bit as an overflow flag), and the total length of the seven tuple is 231 bits. Attributes less than 33 bits will be filled with 0 in front.

In each domain, the disturbance parameter is added to the seven tuple. Using the TLS, domains securely deliver the encrypted seven tuple to CS, and, the disturbance parameter to DS. Flow table content is shown in Figure 3(a) and the schematic diagram of the traffic pretreatment in domains is shown in Figure 3(b).

5.3 Preliminary Calculation in CS

Upon receiving the encrypted seven tuple, CS calculates preliminary seven tuple used for attacks detection. Then, CS sends the results to DS by using TLS.

The calculation process in CS is exhibited in Algorithm 3. Predis employs the kNN for attacks detection, computing distance is thus an important step. We calculate the preliminary results of the distance between the test data and the training data. CS calculates preliminary results in the received encrypted seven tuple directly and obtains the distance between the disturbance data and the training data. We leave the work of removal of the perturbation and get the exact result of distance to DS. The schematic diagram is shown in Figure 4(a).

(a) Preliminary Calculation in CS
(b) Attacks Detection in DS
Figure 4: Schematic Diagram of Preliminary Calculation and Attacks Detection.
1:Training data , set of encrypted seven tuple .
2:.
3:while 1 do
4:     for  do
5:          for  do
6:               .                
7:     for  do
8:          .      
9:     return to DS.
Algorithm 3 Preliminary Calculation in CS

5.4 Attacks Detection in DS

The attacks detection process in DS is abstracted in Figure 4(b) and exhibited in Algorithm 4. Upon receiving domains’ perturbation parameters and CS’s preliminary results of the distance, DS removes the perturbation from preliminary results to get the correct distance. The improved kNN uses the correct distance to get correct detection results. Finally, if the classifier finds DDoS attacks, DS will returns the alarm (Serial Number and Time) to domains. Domains’ operator will respond appropriately after receiving an alarm.

The calculation results of CS are training data minus the perturbed seven tuple (). Since DS has the perturbation parameters , it can get the correct distance for attacks detection by using the perturbation parameters with the calculation results of CS () subtracted. Finally, the improved kNN calculates the results of DDoS attacks detection.

1:Seven tuple of preliminary calculation’s result , perturbation parameter .
2:Alarm from detection .
3:while 1 do
4:     for  do
5:          for  do
6:               .           
7:          Doing attacks detection by using .
8:          get the detection result by Algorithm 1.      
9:     return to domains.
Algorithm 4 Attacks Detection in DS

6 Security Analysis

As described in Section 3.3, our security goal is to protect the privacy of each input peers. Predis uses TLS to protect privacy in data transmission process. TLS is generally accepted as secure for data transfer across a network. Besides the correct data receiver, any external eavesdropper cannot eavesdropping and tampering data. Therefor, we don’t analysis the security of TLS here.

A scheme is secure if any PPT adversary succeeds in breaking the scheme with at most negligible probability [42]. In other words, PPT adversary succeeds in the indistinguishable experiment which is showed in Table 3 with at most negligible probability, we could say that the scheme is secure. It is the Asymptotic Approach in model cryptography to prove the security of a scheme and we use this idea in the following.

In this section, we would use the idea of the Asymptotic Approach and present the proofs by showing the indistinguishability in the following two situations. That is, for CS and DS, input peers’ private information is indistinguishable, and for domains, other input peers’ private information is indistinguishable.

Before formal security analysis, we first set out the meaning of each representation: is private information mentioned in Section 3.3; is the set of encrypted ; is the disturbance parameter of each domains; is preliminary calculation’s result output by CS.

As for domains, what the legal data domains have is their own data and detection result. When they try to gain privacy from others, such as when through means of external eavesdropper, privacy is indistinguishable.

As for CS, the legal data it has is . We construct an encryption scheme as shown in Table 4. We give a theorem (Theorem 1) for it and prove it. If CS does not collude with domains, it is merely a ciphertext-only eavesdropping adversary at this point. If CS colludes with one or more domains, this attack would be a chosen-plaintext attack (CPA) in construction ’ s encryption scheme. In a ciphertext-only attack, the only thing the adversary needs to do is eavesdrop on the public communication channel over which encrypted messages are sent [42]. In the chosen-plaintext attack the adversary is assumed to be able to obtain encryptions and/or decryptions of plaintexts/ciphertexts of its choice [42]. Chosen-plaintext adversary has more useful information than the ciphertext-only adversary, and chosen-plaintext adversary is harder to prevent than the ciphertext-only adversary. When we stopped the chosen-plaintext adversary, we can stopped the ciphertext-only adversary.

By Proof 1, we demonstrate Construction is a CPA-secure private-key encryption scheme for messages of length . Thus, input peers’ private information is indistinguishable for CS.

Theorem 1

If is a pseudorandom function, then Construction is a CPA-secure private-key encryption scheme for messages of length .

Proof 1

Let be an encryption scheme that is exactly the same as , except that a truly random function is used in place of . Fix an arbitrary ppt adversary , and let be an upper bound on the number of queries that makes to its encryption oracle. We show that there is a negligible function as follow and prove this by reduction.

(2)

We use to construct a distinguisher for the pseudorandom function . The distinguisher is given oracle access to some function , and its goal is to determine whether this function is “pseudorandom” (i.e., equal to for uniform or “random”). To do this, emulates experiment for in the manner described below, and observes whether succeeds or not. If succeeds then guesses that its oracle must be a pseudorandom function, whereas if does not succeed then guesses that its oracle must be a random function.

runs in polynomial time since does. The key points are as follows:

If ’s oracle is a pseudorandom function, then the view of when running as a subroutine by is distributed identically to the view of in experiment . This is because, in this case, a key is chosen uniformly at random and then every encryption is carried out by choosing a uniform , and setting the ciphertext equal to , exactly as in Construction .

If ’s oracle is a random function, then the view of when running as a subroutine by is distributed identically to the view of in experiment . This can be seen exactly as above, with the only difference being that a uniform function is used instead of .

(3)

Through Formula (3), we know . Combining the above and the assumption that is a pseudorandom function, there exists a negligible function for which . From the Definition 2, we have proved Construction is a CPA-secure private-key encryption scheme for messages of length .

Let be a pseudorandom function. Define a private-key encryption scheme for messages of length .
Gen: on input , choose uniform and output it.
Enc: on input a key and a massage , output the ciphertext.
           
Dec: on input a key and a ciphertext , output the plaintext message.
           
Table 4: Construction of

As for DS, the legal data it has is and . Since and , DS has legal data . We construct an encryption scheme as shown in Table 5. If DS does not collude with one or more domains, it is a ciphertext-only eavesdropping adversary at this point for encryption scheme . If DS colludes with one or more domains: having no other CS’s private key, they would be unable to get other input peers’ . DS owns perturbation parameter . Therefore, this situation is CPA-secure in encryption scheme , which is stated in Theorem 2. By Proof 2, we demonstrate that input peers’ private information is indistinguishable for DS.

Gen: choose and output it.
Enc: on input a key and a massage , output the ciphertext.
           
Dec: on input a key and a ciphertext , output the plaintext message.
           
Table 5: Construction of
Theorem 2

The encryption scheme is a CPA-secure private-key encryption scheme for messages of length .

Proof 2

Through Formula (4), we learn that in ciphertext-only attack, where is the probability of guessing training set.

(4)

In general, there are 2500 records in training data set for kNN, each of them being 32*5=160 bits in our scheme. Thus, . With the idea of asymptotic approach, we consider negligible.

In chosen-plaintext attack, could be equal with , where is the number of queries to the Oracle.

If is a polynomial about n, is still negligible. In encryption scheme , is a polynomial about the number of collusive domians, so is negligible. Encryption scheme has indistinguishable encryptions under a chosen-plaintext attack.

7 Evaluation

This section evaluates Predis in terms of accuracy, expansibility, time consumption, and compatibility.

7.1 Preliminary

Dataset. Since simulating attack scenarios has a major defect in terms of traffic diversity, we employ five sets of public traffic traces for our experiments, including the CAIDA “DDoS attacks 2007” traces [46], the CAIDA Anonymized 2008 Internet traces [47], the 2000 DARPA LLDOS 1.0 and LLDOS 2.0.2 traces[4], the 1999 DARPA traces [4], and the KDD Cup 1999 traces [48]. Besides, we deployed a DDoS attacks experiment and captured relevant traffic traces for our experiments. The file format of these datasets is .pcap which pertains every packets’ detail. We parse these .pcap files by flow statistics to simulate the flow table collected by a controller in SDNs.

Using the combinations of these traces, we define three datasets for experiments.

Domains 1999 DARPA LLDOS 1.0 LLDOS 2.0.2
Packets Flows Packets Flows Packets Flows
172.16.112.* 1573963 427056 1237 1104 376 354
172.16.113.* 585996 236122 338 328 255 254
172.16.114.* 835099 311354 344 258 24 16
172.16.115.* 5090 3838 2428 1336 632 611
131.84.1.31(Victim) 35645 17830 108509 107465 2100 2003
202.77.162.213(Attacker) 52606 24498 3722 3174 1724 1634
Domains Merge 2502403 1020698 116578 107667 5111 560
Table 6: Statistics of Dataset 1

Dataset 1. The 1999 DARPA and 2000 LLDOS traces were collected from the same network topology. Thus, in Dataset 1, we use the 1999 DARPA traces as normal traffic, and use the 2000 LLDOS traces as anomaly traffic. We segment domains by the IP address segment. Victims and attackers are located in different domains. Statistics of Dataset 1 are shown in Table 6.

Dataset 2. All traffic in the CAIDA were collected from both directions of an OC-192 Internet backbone link by CAIDA’s equinix-chicago monitor. Thus, in Dataset 2, we use the CAIDA Anonymized 2008 Internet traces as normal traffic, and use the CAIDA “DDoS attacks 2007” traces as anomaly traffic.

Dataset 3. We used Python and Scapy444Scapy is a python library used for interactive packet. to achieve the simulation of synchronous (SYN) flood attack. To simulate DDoS attacks, we used hosts to launch SYN flood attacks against a host, and then collected 5 minutes of abnormal traffic on this victim host. To obtain the abnormal traffic as clean as possible, when collecting abnormal traffic, we closed all the applications on the victim host. We collected another 45 minutes normal traffic in this victim host when there is no attacks. Statistics of Dataset 3 are shown in Table 7.

Packets Flows
Anomaly traffic 114214 79440
Normal traffic 941904 18582
Table 7: Statistics of Dataset 3

In addtion, the KDD Cup 1999 traces are used alone to evaluate performance of Predis in part of compatibility. Statistics of the KDD Cup 1999 traces are shown in Table 9.

Domains CAIDA Anonymized 2008 CAIDA DDoS attacks 2007
Packets Flows Packets Flows
Sample 1 1370524 243997 435428 63093
Sample 2 1377329 243724 426769 62574
Sample 3 1286528 237956 445796 63749
Sample 4 1299980 234976 431176 63349
Sample 5 1340870 246775 398453 59467
Sample 6 1338945 243518 413624 61057
Table 8: Statistics of Cross Validation Using Dataset 2

Cross-Validation. To evaluate the performance difference between Predis and others, we employ cross-validation for each dataset.

Dataset 1. To protect traffic characteristics of DDoS attacks of per phase in LLDOS, we do not partition LLDOS’s data. When LLDOS 1.0 is used as the training dataset, LLDOS 2.0.2 will be used as the test dataset. When LLDOS 2.0.2 is used as the training dataset, LLDOS 1.0 will be used as the test dataset. DARPA1999 traces always act as background traffic, selecting 50% of which to create the training dataset and the remaining 50% for validation.

Dataset 2. As network traffic data is time-dependent, we divide traces on a time basis. When performing cross validation, we divide both the CAIDA Anonymized 2008 Internet traces and the CAIDA “DDoS attacks 2007” traces into 6 partitions per 3 seconds. Each time we take one of them as the test dataset, the remainder as the training dataset. Cross validation statistics of Dataset 2 are shown in Table 8.

Type of Attacks Normal DoS Prob U2R
Number of Connections 972781 391502 4107 218
Table 9: Statistics of KDD Cup 1999 Traces

Methods to Compare. Predis is a privacy-preserving cross-domain detection in SDNs. To evaluate the performance of Predis in a comprehensive way, we select three methods as the state-of-the-art for comparison, i.e., SVM, SOM, and PSOM. Kokila et al. [20] leverage SVM to perform DDoS attacks detection in SDNs whereby high accuracy rate has been achieved. Braga et al. [30] use SOM to perform DDoS attacks detection in SDNs. PSOM is a cross-domain DDoS detection scheme using SOM as the classifier and introduces privacy-preserving proposed by Bian et al. [22]. Besides, liner kNN (kNN) is implemented to clear the improvement towards kNN in Predis in term of speed. We disable the privacy-preserving function in Predis

and name it as PkNN, and Naive Bayes (NB) is also implemented to have a better view.

Comparison Criteria. The fundamental goal of attacks detection is accuracy (i.e., identifying more anomalies in the ground truth and avoiding false alarms) [49]. We use precision () and recall () to measure the detection accuracy.

7.2 Evaluation of Classifier Performance

(1) Selection of The Best Value.

To find the best value for the improved kNN in Predis

, we observe the changes in time consumption, precision, and recall, when

increases from 5 to 35 gradually, where the privacy-preserving component in Predis is temporarily disabled. Subsequently, we determine and select the best value for our scheme. The experimental data for evaluation is the Sample 1 in Dataset 2. The training dataset size is 2400, including 1200 normal traffic instances and 1200 abnormal traffic instances, whose scale remains unchanged in the following experiments.

Experimental results are depicted in Figure 5, where the vertical coordinates are changes in the value. Figure 5(a) and (b) exhibit the evaluation results of precision (recall) and time consumption, respectively. We can find that an appropriate value lies between 20 and 25, where Predis achieves relatively high accuracy without introducing heavy time overhead. Thus, we choose the value as 23 in the following experiments.

(a) Precision (b) Time Consumption
Figure 5: Precision and Time Consumption by Varying .

(2) Classifier Performance.

To better comprehend the performance of the classifier in Predis in speed and accuracy, we conduct a comparison between Predis, PkNN, kNN, PSOM, SOM, SVM and NB by using Dataset 3, which results is depicted in Figure 6. It can be seen from the Figure 6(a), compared to other algorithms, kNN has a higher accuracy, but its speed has no advantage.

Although we use some algorithms (KD-tree and BBF) to improve its speed, it is still not the least time consumption algorithm as shown in the Figure 6(b) by the PkNN speed. An important reason for choosing kNN as the classifier in Predis is that kNN is easy to calculate (cf., Section 4.1) which facilitates embedding the encryption steps into it, and another reason is that kNN has relatively higher accuracy which have been confirmed in this experiment. Besides, Predis and PSOM have relatively high time consumption because of the privacy protection process.

(a) Evaluation of Accuracy (b) Evaluation of Time Consumption
Figure 6: Comprehensive Evaluation of the Classifier in Predis.

7.3 Evaluation of Accuracy

(1) Accuracy evaluation between single-domain and cross-domain scenarios.

Cross-validation is conducted using Dataset 1. From the results in Table 10, we can see that the precision and recall of the single-domain detection is lower than those of the cross-domain detection, which undoubtedly proves our stance that DDoS attacks detection in cross-domain is truly necessary. We also observe that Predis is superior to PSOM in terms of presicion and recall. In domain 172.16.115.*, Predis and PSOM both have low precision and recall. The reason lies in that fewer hosts are invaded by attackers in domain 172.16.115.*, and invaded hosts have not generated many abnormal traffic. The lack of training dataset results in not so outstanding detection results. But Predis’s precision and recall is still above .

LLDOS 1.0 as Training Dataset LLDOS 2.0.2 as Training Dataset
Precision Recall Precision Recall
Domains Predis PSOM Predis PSOM Predis PSOM Predis PSOM
172.16.112.* 0.9720 0.9713 0.9010 0.5558 0.9730 0.9700 0.9000 0.6734
172.16.113.* 0.9973 0.9600 0.9930 0.6700 0.9914 0.9651 0.9916 0.8100
172.16.114.* 0.9900 0.9843 0.9874 0.7800 0.9255 0.7630 0.8500 0.7180
172.16.115.* 0.8980 0.8741 0.8865 0.8700 0.8690 0.8053 0.8500 0.8156
Victim 0.9710 0.8160 0.9750 0.7071 0.8320 0.8086 0.7020 0.6700
Attacker 0.9200 0.7753 0.9200 0.7200 0.9052 0.4133 0.8700 0.6400
Domains Merge 0.9985 0.9780 0.9923 0.9653 0.9920 0.8986 0.9812 0.8200
Table 10: Accuracy Comparison between Single-Domain and Cross- Domain Approaches

(2) Accuracy evaluation of detecting DDoS attacks at early stages.

It is desirable to detect DDoS attacks at the first and second stages. We conduct cross-validation in Dataset 1 and divided Dataset 1’s traces into three stages (scanning, intrusion, and attacking). At both the attack scanning and intrusion stages, Predis delivers excellent detection results as shown in Table 11, which means that Predis can identify attacks at early stages. In contrast to PSOM, Predis achieves better detection results at any stage of the attack, and its detection result at stages of scanning and intrusion is suboptimal. Moreover, the precision of Predis is when LLDOS 1.0 as the training dataset at the attacking stage.

LLDOS 1.0 as Training Dataset LLDOS 2.0.2 as Training Dataset
Precision Recall Precision Recall
Domains Predis PSOM Predis PSOM Predis PSOM Predis PSOM
Scanning 0.9734 0.8730 0.8645 0.7630 0.8612 0.8053 0.9251 0.8945
Intrusion 0.9920 0.8740 0.9900 0.8821 0.9400 0.8392 0.8830 0.8755
Attacking 0.9919 0.9733 0.9614 0.9200 0.9808 0.9472 0.8792 0.8740
Table 11: DDoS Attacks Detection Accuracy at Each Stage

(3) Accuracy evaluation with privacy-preserving or without.

Using Dataset 2 for cross-validation, we make a comparison between Predis, the SVM and SOM methods. To evaluate the impact of privacy-preserving in Predis on detection accuracy, we also disable the privacy-preserving function in Predis and refer to this variation as CAD in comparison.

Experimental results are plotted in Figure 7, where the precision and recall are exhibited in two subfigures, respectively. The x-axis indicates the serial number in cross-validation.

When we detail Predis in Section 5.4, we have proved through theoretical analysis that the detection result of introducing privacy protection will not have any impact on detection accuracy. This conclusion is validated in Figure 7, where Predis and CAD obtain the same detection accuracy. In addition, we can find that Predis outperforms the other two methods in terms of both precision and recall.

(a) Precision (b) Recall
Figure 7: The Results of Precision and Recall of -th Cross-Validation Using Dataset 2.

7.4 Evaluation of Scalability

Using Dataset 1, we record the changes of time consumption and accuracy when the number of domains goes up from one to six, as shown in Table 12 and Figure 8.

When the number of domains increases, time consumption is only determined by the amount of traffic instances and, the detection effect remains almost unchanged. The time complexity of the improved kNN is , where is the number of traning instances. As can be seen from the experimental results, the time consumption of Predis does not surge and meets linear variations as the number of domains increases.

LLDOS 1.0 as Training Dataset LLDOS 2.0.2 as Training Dataset
Precision Recall Precision Recall
Number of domains Predis PSOM Predis PSOM Predis PSOM Predis PSOM
1 0.9720 0.9713 0.9010 0.5558 0.9730 0.9700 0.9000 0.6734
2 0.9840 0.9672 0.9769 0.6530 0.9207 0.9270 0.9501 0.9468
3 0.9912 0.9760 0.9800 0.6734 0.9232 0.9105 0.9543 0.9208
4 0.9914 0.9613 0.9900 0.7133 0.9308 0.9256 0.9683 0.9660
5 0.9920 0.9514 0.9913 0.8320 0.9591 0.8512 0.9537 0.9201
6 0.9880 0.9438 0.9864 0.7689 0.9508 0.8380 0.9517 0.9106
Average 0.9864 0.9618 0.9709 0.6994 0.9424 0.9037 0.9463 0.8896
Table 12: Evaluation of Detection Accuracy with Varied Numbers of Domains Using Dataset 1
(a) LLDOS 1.0 as Training Dataset (b) LLDOS 2.0.2 as Training Dataset
Figure 8: Evaluation of Time Consumption with Varied Numbers of Domains Using Dataset 1.

7.5 Evaluation of Time Consumption

First, from Figure 8, we can see that with the increasing number of domains, Predis has obvious advantages in terms of time-consumption compared to PSOM.

We analyze the MAWI dataset [50] which are based on the collected network traffic during 7 years of a specific link between Japan and the USA. The backbone generates about 6,000 KB (130,000 flows) traffic per second. If we can process 13,000 flows per second (1/10 of the traffic generated by this backbone line), it shows that Predis can meet the time-consuming requirement. The time spent is the sum of the time spent between CS and DS. We record the time spent on two server when the amount of test instances increase from 1,000 to 10,000, where Dataset 2 is used. From Figure 9(a) we can see that when 10,000 flows for testing, the total time does not exceed 1 second. With privacy-preserving, the time consumption of Predis is also acceptable.

In addition, using Dataset 2, we record the time spent in 6 cross-validation experiments, where PSOM is the control group and the results are shown in Figure 9(b). We can see in Figure 9(b), compared to a similar scheme PSOM, Predis has a lower time consumption.

(a) Time Consumption with Varied Numbers of Flows (b) Evaluation of Time Consumption Using Dataset 2
Figure 9: Evaluation of Time Consumption.

7.6 Evaluation of Detecting Other Attacks

We hope that the attacks detection scheme will detect not only DDOS attacks but also other attacks. We testify through evaluation that Predis is not only suitable for detecting DDOS attacks, but also capable to detect a variety of attacks and retains an excellent accuracy after changing the feature selection model properly. We conduct experiments using the KDD Cup 1999 traces, which have several other types of attacks besides DDoS attacks. The feature selection model here is a four-tuple (, , , ).

(1) Time consumption evaluation when detecting other attacks.

We detect three types of attacks in the KDD Cup 1999 traces, such as DOS, Prob, and U2R, and record the time it takes as well as accuracy. Results are depicted in Table 13.

DOS Prob U2R
Criterion Predis PSOM Predis PSOM Predis PSOM
Precision 0.9161 0.8872 0.9901 0.8753 0.8834 0.8810
Recall 0.9003 0.8706 0.9874 0.9644 0.7900 0.7650
Total Time 616ms 707ms 785ms 833ms 569ms 640ms
Table 13: Time Consumption on Detecting Other Attacks

(2) Accuracy evaluation when detecting other attacks.

We detect three types of attacks in the KDD Cup 1999 traces, such as DOS, Prob, and U2R, and record the accuracy in Table 14, where SOM and SVM are compared.

DOS Prob U2R
Criterion Predis SOM SVM Predis SOM SVM Predis SOM SVM
Precision 0.9161 0.8872 0.9024 0.9901 0.8753 0.9732 0.8834 0.8310 0.8604
Recall 0.9103 0.8706 0.9041 0.9874 0.9644 0.9566 0.7900 0.7650 0.7622
Table 14: Evaluation of Accuracy in Detecting Other Attacks

When detecting attacks of DoS, Prob, and U2R, the precision of Predis is no less than 90%. Compared to the same privacy-preserving scheme PSOM, Predis is better in terms of time consumption (as shown in Table 13). Because in the KDD Cup 1999 traces, U2R attacks have very few abnormal traffic instances, therefore recall is relative low when testing U2R attacks. Compared with SVM and SOM, Predis wins out as far as precision and recall are concerned (as shown in Table reftable:Time-Accuracy evaluation when detect other attack). Through these two experiments, it has been proven that Predis not only can detect DDoS attacks excellently, but also can detect other attacks very well when appropriately selected features are incorporated.

8 Discussion

If SDNs controllers can know the DDoS attacks in it early stages in time and make corresponding measures (e.g., limiting the SYN / ICMP traffic, filtering specific IP addresses, traffic cleaning, etc.), SDNs controllers can prevent the DDoS attacks before it causes damage. SDNs uses centralized control for traffic forwarding mechanism that makes it easier to stop DDoS attacks. Predis has been demonstrated that it can detect DDoS attacks in the early stages of an attack (as show in Table 11). Once the server detects a DDoS attacks, it alerts the controller in time. The controller immediately responds to the alert, and prevents any further damage from the attack.

The roles in Predis include two servers and SDNs controllers. Predis can prevent the collusion between one or more domains with one server, so the two servers can be respectively deployed in any one of the participating domains, or wherever providing secure communications with the domains over the TLS protocol. The data plane and the control plane of the SDNs communicate by using a control-data-plane interface (CDPI). The main use of the uniform communication standard is OpenFlow protocol. Flow table operations that include flow table pretreatment, communicate with the server, proper handling for DDoS alerts or other operation can be deployed and implemented through the use of the southbound interfaces (i.e., COPI).

Due to the development of different network access technologies and different communication systems, resource scheduling and fusion in heterogeneous networks have become hot topics. SDNs can achieve unified management in configuration heterogeneous network equipment and it open a variety of interfaces. But these interfaces perhaps is exploited by attackers, such as tapping, interception and DDoS attacks. We can regard the heterogeneous networks as a domain under the unified management of a SDNs. Using Predis, Users can achieve secure cross-domain DDoS attacks detection and resist the threats from DDoS attacks in heterogeneous networks. In addition, Predis’s idea of combining data perturbation with data encryption to provide cross-domain privacy protection may also be transplanted to other application scenarios in the future to achieve other secure multi-party computing. Overlay networks such as P2P (Peer-to-Peer) add virtual channels to physical networks to enhance network flexibility. Each node in P2P networks may be the data provider. When considering monitoring the traffic of multiple nodes without compromising the privacy, each node in the P2P can be considered as a domain, and the idea of Predis in cross-domain detection can be used to provide security monitoring service.

9 Conclusion

In this paper, we presented a SDN-based cross-domain attacks detection scheme with privacy protection. We studied cross-domain privacy protection problems and DDoS attacks detection based on SDNs. We combined geometric transformation and data encryption method with the view to protect privacies. We broke down the detection process into two steps, disturbance and detection, and introduced two servers that work together to complete the detection process. We optimized the kNN for low time consumption and high accuracy. Extensively experiment results showed that Predis is capable to detect cross-domain anomalies while preserving the privacy with low time consumption and high accuracy. We plan to further reduce the time consumption of Predis in attacks detection in the future work.

References

  • [1] D. B. Rawat and S. R. Reddy, “Software defined networking architecture, security and energy efficiency: A survey,” IEEE Communications Surveys & Tutorials, vol. 19, no. 1, pp. 325–346, 2017.
  • [2] Ponemon institute. [Online]. Available: https://www.ponemon.org/
  • [3] J. Mirkovic and P. Reiher, “A taxonomy of ddos attack and ddos defense mechanisms,” ACM SIGCOMM Computer Communication Review, vol. 34, no. 2, pp. 39–53, 2004.
  • [4] the lincoln laboratory of mit. [Online]. Available: https://www.ll.mit.edu/ideval/data/2000data.html
  • [5] P. Zhang, X. Huang, X. Sun, H. Wang, and Y. Ma, “Privacy-preserving anomaly detection across multi-domain networks,” in Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on.   IEEE, 2012, pp. 1066–1070.
  • [6] A. Soule, H. Ringberg, F. Silveira, J. Rexford, and C. Diot, “Detectability of traffic anomalies in two adjacent networks,” in PAM.   Springer, 2007, pp. 22–31.
  • [7] L. Shiying, “privacy sensitive deep packet inspection method,” Master’s thesis, Beijing Institute of Technology, 2015.
  • [8] E. Glatz, S. Mavromatidis, B. Ager, and X. Dimitropoulos, “Visualizing big network traffic data using frequent pattern mining and hypergraphs,” Computing, vol. 96, no. 1, pp. 27–38, 2014.
  • [9]

    Y. Meidan, M. Bohadana, A. Shabtai, J. D. Guarnizo, N. O. Tippenhauer, and Y. Elovici, “Profiliot: a machine learning approach for iot device identification based on network traffic analysis,” in

    Symposium on Applied Computing, 2017, pp. 506–509.
  • [10] Z. M. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue, and K. Mizutani, “State-of-the-art deep learning: Evolving machine intelligence toward tomorrow’s intelligent network traffic control systems,” IEEE Communications Surveys & Tutorials, vol. 19, no. 4, pp. 2432–2455, 2017.
  • [11] T. T. Oo and T. Phyu, “Statistical anomaly detection of ddos attacks using k-nearest neighbour,” IJCCER, vol. 2, no. 1, pp. 06–11, 2014.
  • [12] J. David and C. Thomas, “Ddos attack detection using fast entropy approach on flow- based network traffic,” Procedia Computer Science, vol. 50, no. 4, pp. 30–36, 2015.
  • [13] A. R. Yusof, N. I. Udzir, and A. Selamat, “An evaluation on knn-svm algorithm for detection and prediction of ddos attack,” in International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems.   Springer, 2016, pp. 95–102.
  • [14] A. Saied, R. E. Overill, and T. Radzik, “Detection of known and unknown ddos attacks using artificial neural networks,” Neurocomputing, vol. 172, pp. 385–393, 2016.
  • [15]

    N. A. Singh, K. J. Singh, and T. De, “Distributed denial of service attack detection using naive bayes classifier through info gain feature selection,” in

    Proceedings of the International Conference on Informatics and Analytics.   ACM, 2016, p. 54.
  • [16] C. J. Hsieh and T. Y. Chan, “Detection ddos attacks based on neural-network using apache spark,” in International Conference on Applied System Innovation, 2016, pp. 1–4.
  • [17] S. Wei, Y. Ding, and X. Han, “Tdsc: Two-stage ddos detection and defense system based on clustering,” in Ieee/ifip International Conference on Dependable Systems and Networks Workshop, 2017, pp. 101–102.
  • [18] B. Wang, Y. Zheng, W. Lou, and Y. T. Hou, “Ddos attack protection in the era of cloud computing and software-defined networking,” in IEEE International Conference on Network Protocols, 2014, pp. 624–629.
  • [19] S. M. Mousavi and M. St-Hilaire, “Early detection of ddos attacks against sdn controllers,” in Computing, Networking and Communications (ICNC), 2015 International Conference on.   IEEE, 2015, pp. 77–81.
  • [20] R. Kokila, S. T. Selvi, and K. Govindarajan, “Ddos detection and analysis in sdn-based environment using support vector machine classifier,” in Advanced Computing (ICoAC), 2014 Sixth International Conference on.   IEEE, 2014, pp. 205–210.
  • [21] Q. Yan, F. R. Yu, Q. Gong, and J. Li, “Software-defined networking (sdn) and distributed denial of service (ddos) attacks in cloud computing environments: A survey, some research issues, and challenges,” IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 602–622, 2016.
  • [22] H. Bian, L. Zhu, M. Shen, M. Wang, C. Xu, and Q. Zhang, “Privacy-preserving anomaly detection across multi-domain for software defined networks,” in International Conference on Trusted Systems.   Springer, 2015, pp. 3–16.
  • [23] R. Bost, R. A. Popa, S. Tu, and S. Goldwasser, “Machine learning classification over encrypted data,” in Network and Distributed System Security Symposium, 2015.
  • [24]

    M. D. Cock, R. Dowsley, C. Horst, R. Katti, A. Nascimento, W. S. Poon, and S. Truex, “Efficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation,”

    IEEE Transactions on Dependable & Secure Computing, vol. PP, no. 99, pp. 1–1, 2017.
  • [25] V. Bijalwan, P. Kumari, J. Pascual, and V. Bhaskar Semwal, “Knn based machine learning approach for text and document mining,” International Journal of Database Theory & Application, vol. 7, no. 1, 2014.
  • [26] L. Schiaffino, A. R. Muñoz, J. F. Villora, M. Bataller, A. Gutiérrez, I. M. Torres, V. Teruel-Martí, and J. G. Martínez, “Feature selection for knn classifier to improve accurate detection of subthalamic nucleus during deep brain stimulation surgery in parkinson’s patients,” in VII Latin American Congress on Biomedical Engineering CLAIB 2016, Bucaramanga, Santander, Colombia, October 26th -28th, 2016.   Singapore: Springer Singapore, 2017, pp. 441–444.
  • [27] M. Yesilbudak, S. Sagiroglu, and I. Colak, “A novel implementation of knn classifier based on multi-tupled meteorological input data for wind power prediction,” Energy Conversion and Management, vol. 135, pp. 434 – 444, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0196890416311888
  • [28] S. Nanda, F. Zafari, C. Decusatis, E. Wedaa, and B. Yang, “Predicting network attack patterns in sdn using machine learning approach,” in Network Function Virtualization and Software Defined Networks, 2017.
  • [29] Y. Xu and Y. Liu, “Ddos attack detection under sdn context,” in Computer Communications, IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on.   IEEE, 2016, pp. 1–9.
  • [30] R. Braga, E. Mota, and A. Passito, “Lightweight ddos flooding attack detection using nox/openflow,” in Local Computer Networks (LCN), 2010 IEEE 35th Conference on.   IEEE, 2010, pp. 408–415.
  • [31] W. S. Quamar Niyaz and A. Y. Javaid, “A deep learning based ddos detection system in software-defined networking (sdn),” CoRR, vol. abs/1611.07400, 2016. [Online]. Available: http://arxiv.org/abs/1611.07400
  • [32] A. C. Yao, “Protocols for secure computations,” in Foundations of Computer Science, 1982. SFCS’08. 23rd Annual Symposium on.   IEEE, 1982, pp. 160–164.
  • [33] X. Shu, D. Yao, and E. Bertino, “Privacy-preserving detection of sensitive data exposure,” IEEE transactions on information forensics and security, vol. 10, no. 5, pp. 1092–1103, 2015.
  • [34] T. Nakamura, S. Kiyomoto, R. Watanabe, and Y. Miyake, “P3mcf: practical privacy-preserving multi-domain collaborative filtering,” in Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on.   IEEE, 2013, pp. 354–361.
  • [35] Dagdelen and D. Venturi, “A multi-party protocol for privacy-preserving cooperative linear systems of equations,” in International Conference on Cryptography and Information Security in the Balkans.   Springer, 2014, pp. 161–172.
  • [36] G. Neugebauer, U. Meyer, and S. Wetzel, “Fair and privacy-preserving multi-party protocols for reconciling ordered input sets,” Information Security, pp. 136–151, 2011.
  • [37] Q. Chen, C. Qian, and S. Zhong, “Privacy-preserving cross-domain routing optimization-a cryptographic approach,” in Network Protocols (ICNP), 2015 IEEE 23rd International Conference on.   IEEE, 2015, pp. 356–365.
  • [38] M. Burkhart, M. Strasser, D. Many, and X. Dimitropoulos, “Sepia: privacy-preserving aggregation of multi-domain network events and statistics,” Usenix Security Symposium, 2010.
  • [39] W. K. Wong, D. W.-l. Cheung, B. Kao, and N. Mamoulis, “Secure knn computation on encrypted databases,” in Proceedings of the 2009 ACM SIGMOD International Conference on Management of data.   ACM, 2009, pp. 139–152.
  • [40] Y. Elmehdwi, B. K. Samanthula, and W. Jiang, “Secure k-nearest neighbor query over encrypted data in outsourced environments,” in Data Engineering (ICDE), 2014 IEEE 30th International Conference on.   IEEE, 2014, pp. 664–675.
  • [41] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-preserving multi-keyword ranked search over encrypted cloud data,” IEEE Transactions on parallel and distributed systems, vol. 25, no. 1, pp. 222–233, 2014.
  • [42] J. Katz and Y. Lindell, Introduction to modern cryptography, ser. CRC Cryptography and Network Security Series.   CRC press, 2014.
  • [43] T. Cover and P. Hart, Nearest neighbor pattern classification.   IEEE Press, 1967.
  • [44] R. Sedgewick and K. Wayne, Algorithms, 4th Edition.   Addison-Wesley, 2011.
  • [45] T. Dierks, “The transport layer security (tls) protocol version 1.2,” 2008.
  • [46] The caida ucsd. [Online]. Available: http://www.caida.org/data/passive/ddos-20070804_dataset.xml
  • [47] The caida ucsd. [Online]. Available: http://www.caida.org/data/passive/passive_2008_dataset.xml
  • [48] Kdd cup. [Online]. Available: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
  • [49] D. Liu, Y. Zhao, H. Xu, Y. Sun, D. Pei, J. Luo, X. Jing, and M. Feng, “Opprentice: towards practical and automatic anomaly detection through machine learning,” in Proceedings of the 2015 ACM Conference on Internet Measurement Conference.   ACM, 2015, pp. 211–224.
  • [50] Mawilab. [Online]. Available: http://www.fukuda-lab.org/mawilab/v1.1/index.html