Denial of Service (DoS) and Distributed DoS (DDoS) attacks  can cause serious damage on any networked system. Recently also new DDoS attack variants like stealthy and silent saturation DDoS attacks (also named low-rate and slow DoS attacks)  are observed. Current networks do not yet provide sufficient and efficient countermeasures to defense against these type of attacks . A global view of the network state and traffic situation is required for effective attack mitigation. The Software Defined Networking (SDN) concept  is a promising approach to tackle the attack detection and mitigation problem as it can provide a fine grained view on the traffic flows by appropriately setting the flow matching fields in the SDN switches. Although, SDN offers a great potential to defend against novel DoS attack types like stealthy DoS attacks, it is also vulnerable to these attacks, as illustrated in Fig. 1. For example, SDN-based forwarding devices, i.e., OpenFlow switches , can suffer from overflow problems caused by a silent saturation DoS attack [4, 2, 5].
Stealthy DoS attacks are very difficult to be detected by network operators without accessing the victim machine as the attacker behaves similar to clients with a bad network connection . Even there are some research efforts [4, 5, 7, 8, 9, 10] to detect silent DoS attacks, network operations currently still rely on random , predefined threshold-based  or complex high effort  mechanisms. None of these mechanisms utilize machine learning techniques to provide an early detection of stealthy DoS attacks.
Therefore, in this paper, we propose a novel machine learning based defense framework called -MIND, to effectively detect and mitigate stealthy DoS attacks in SDN-based networks. We first analyze the adversary model of stealthy DoS attacks, the related vulnerabilities in SDN-based networks and the key characteristics of stealthy DoS attacks. Next, we describe and analyze a detection system that uses a Reinforcement Learning-based approach based on -Learning in order to maximize its detection performance. Finally we outline the complete -MIND defense framework that incorporates the optimal policy derived from the -Learning agent to efficiently defeat stealthy DoS attacks in SDN-based networks.
The paper is structured as follows. Section II provides some information about stealthy DoS adversary models and existing countermeasures. Section III outlines the architecture of a machine learning based stealthy DoS attack detection system. Section IV presents our approach for optimizing the attack detection performance via -Learning. The complete -MIND framework is described in section V and the results of the performance analysis are outlined in Section VI. Section VII concludes the paper.
Ii Stealthy DoS Attacks
Ii-a Basic Principle
Recently a new type of Denial-of-Service (DoS) or Distributed DoS (DDoS) attacks  were identified - stealthy and silent saturation attacks also called low-rate and slow DoS attacks . These attacks work differently than the high-rate and volumetric DoS attacks. Instead of sending requests to the victim server with a as high as possible rate, attackers periodically send requests with low rate consuming only little resources to render the victim server inaccessible. Stealthy DoS attacks are very hard to be detected in non-SDN networks without accessing the victim machine as the attacker behaves similar to clients with a bad network connection . During the attack, attackers may exhibit an ON-OFF attack pattern which comprises consecutive periods of inactivity (called off-time) and activity (called on-time). Once a stealthy DoS attack has seized all memory space for active connections in a Web server, the attacker tries to keep these connections open as long as possible by exploiting the characteristics of either a specific protocol (e.g., HTTP, DNS) or the application software (e.g., PHP, SOAP) [7, 8].
Ii-B Vulnerabilities in Software Defined Networks w.r.t. Stealthy DoS Attacks
If the SDN control plane implements a simple flow matching strategy, e.g., only using destination MAC or IP addresses, the ability to track and monitor network traffic for security or forensic analysis is limited. Therefore in order to detect malicious traffic flows stemming from stealthy DoS attacks a more sophisticated flow matching comprising further packet header fields (e.g., IP source addresses) has to be deployed. However, such a flow matching strategy leads to much more flow entries in the SDN switch, so that the maximum number of flow entries might be reached quite soon which then would cause a significant degradation of the forwarding performance or even outage of the switch. Fig. 1 illustrates stealthy DoS attacks in the SDN environment. An SDN switch is able to simultaneously maintain only a limited number of flow rules, e.g., 2000-3000 flow rules (in case of an OpenvSwitch, ) as the flow rules are stored in power-hungry and expensive Ternary Content Addressable Memory. Therefore, attackers can easily compromise an SDN switch by sending new packets which do not match to any current flow rules in the switch. As the current flow rules are preserved, this dramatically increases the number of flow rules in the switch. Consequently, not only the server becomes the victim of a stealthy DoS attack, but also the SDN control and data plane components suffer from resource exhaustion [5, 4]. Normally, in case of stealthy DoS attacks the number of flow rules do not exceed the flow-table capacity in the SDN switches. Therefore it is a quite challenging task for an anomaly detection mechanism to detect these type of attacks [5, 6]. To our best knowledge, there are no previous studies that completely solved the stealthy DoS attack detection problem so far.
Ii-C Existing Countermeasures
The problem of precise detection and mitigation of stealthy DoS attacks is already addressed in the SDN research community. The authors in 
propose a method which monitors the number of flow entries in SDN switches and, after exceeding a threshold, randomly drops flow rule entries. However there is a probability of dropping flows of legitimate clients as well. The detection technique proposed in monitors every incoming flow and calculates suspiciousness scores. But it requires a high effort to store and analyze information of all flows. The authors in  recommend that if the number of incomplete HTTP requests are larger than a threshold, a defense scheme is triggered to drop incomplete request flows. The threshold value might differ from server to server. In addition, in order to prevent attackers from probing idle and hard timeout values in the target SDN-based network, the authors in  propose to generate an artificial jitter and set a dynamic timeout for incoming packets. This technique however induces the control plane to process more packet-in messages leading to extra forwarding delays for benign packets. Overflow problems in SDN switches caused by a stealthy DoS attack might be solved by a flow table sharing scheme with neighbor switches  whenever an attack occurs. However, in case of a massive attack, neighbor switches can be flooded by the victim switch and the whole network might suspend operation. In , the authors propose to validate source IP addresses by querying the log and if the number of packets/second and the number of bytes/second sent from a source IP address exceed a threshold, the flow rules related to the source IP address are removed in the SDN switch.
All mentioned existing methods for stealthy DoS/DDoS attack detection rely on either random , predefined threshold-based  or high effort  mechanisms. This motivated us to propose a machine learning based approach to efficiently detect and mitigate stealthy DoS attacks at an early stage. To our best knowledge, our proposal is the first one applying machine learning based detection for stealthy DoS attacks.
Ii-D Characteristics of Stealthy DoS Attacks
One of the key characteristics of stealthy DoS attacks is that it does not matter whether attackers use non-spoofed or spoofed IP addresses to generate malicious requests to a victim server - see Fig. 1. This is because in case the SDN network applies a traffic flow matching mechanism comprising layer 3 (IP addresses) and layer 4 (TCP/UDP ports), abnormal source IP addresses have to be used repeatedly to keep flow rules related to attack traffic alive. Otherwise, flow rules of the used sources will be removed after a flow idletimeout. Therefore, for traffic anomaly detection it is feasible to rely on source IP addresses and categorise incoming traffic flows accordingly. This motivated us to develop a source-based mechanism for detecting stealthy DoS attacks.
Iii Basic Architecture of The Stealthy DoS Attack Detection System
Fig. 2 (a) shows the architecture model of a stealthy DoS detection/mitigation system residing in the SDN application plane. Regarding the system operation, first of all data from the network is gathered - for example the northbound APIs (see Fig. 1
) can be used to query for statistics data from the SDN controller. Detailed statistics information of individual traffic flows in the SDN switches is periodically collected by the SDN controller. Afterwards, the collected data is post-processed in multiple steps. The feature engineering module extracts from the collected data for each source-specific traffic flow a set of features, e.g., the average packets per flow, average packet size per flow, packet change ratio, flow change ratio, etc.. Out of these features the optimum ones (identified by the AOS, see below) are taken and fed into the chosen Artificial Intelligence/Machine Learning (AI/ML) algorithm. The task of the AI/ML algorithm is to classify (based on the selected features) each source IP address into a normal one or malicious one. In case an attack source IP address is recognized the attack mitigation policy creation module formulates a policy for removing malicious source-specific flow rules and blocking the malicious source. The northbound APIs are utilized again to tell the SDN controller to deploy the attack mitigation policies in the data plane.
The application operator and scheduler (AOS) plays an important role in selecting the optimal feature sets and the AI/ML algorithms for an efficient detection operation. The AOS is only active in the training phase, i.e., before the actual runtime operation of the detection system. In the training phase we require a set of labelled traffic data including abnormal and benign samples. This data is either generated by performing simulation experiments of stealthy DoS attacks in an SDN environment or generated from publicly available data sets . The AOS utilizes the labelled data set to train a chosen AI/ML algorithm with a feature set, and afterwards conducts a cross-validation test to evaluate the attack detection performance for the selected combination of feature set and AI/ML algorithm. Hence, the AOS can adjust these selections to achieve the best attack detection performance. Nonetheless, in practice, selecting an optimal set of features and a suitable AI/ML algorithm is challenging for every classification problem  as there are many possible combinations. Therefore, in the following, we introduce an optimal selection algorithm based on reinforcement learning  that can supervise the optimum selection of the features and AI/ML algorithms.
Iv Optimum Selection of Features and AI/ML-based Classification Algorithms
In order to achieve the optimum combination of a certain feature set and a specific AI/ML algorithm w.r.t. the anomaly detection performance, we adopt the Markov Decision Process (MDP) approach  with episodic tasks. The MDP framework allows the AOS to take an optimal action (combination of feature set and AI/ML algorithm) based on its observations in order to maximize its immediate reward in every single episode. The reward is expressed in terms of multiple evaluation criteria - see below. The MDP is characterized by ¡,,¿, where is the state space, is the action space, and is the immediate reward of the detection system. For evaluating the anomaly detection performance of an action (feature set and AI/ML algorithm), we consider common metrics  including precision (), recall (
), F-score (), accuracy (), and false alarm rate (). These metrics are calculated from the following observations: TP (True Positive) - number of attacks precisely detected; TN (True Negative) - number of normal patterns precisely classified; FP (False Positive) - number of normal patterns incorrectly classified; and FN (False Negative) - number of attacks unsuccessfully detected. The details of the MDP model are outlined hereafter.
Iv-1 State Space
Formally, we can define the state space of the detection system as follows:
where , , , and
. Then, the state of the detection system is defined as a vector= .
Iv-2 Action Space
denotes a group of feasible feature sets composed of all available and suited features, e.g., a feature set consists of 4 features (average packets per flow, average packet size per flow, packet change ratio and flow change ratio).
represents a set of possible AI/ML algorithms that can be used for traffic flow classification, e.g., Support Vector Machine14]
, and Self Organizing Map. Then, a tuple, , is referred as a combination of a feature set and an AI/ML algorithm. Applying a tuple to the environment, i.e., the detection system - see Fig. 2 (b), means an action is taken by the AOS component. Therefore, the action space is defined as:
Iv-3 Immediate Reward Function
As aforementioned, we evaluate the anomaly detection performance of a tuple by five criteria. Hence, we define the immediate reward function of the detection system after the AOS takes an action at state as the following fitness function:
where and are weight factors related to the corresponding evaluation criteria, and . Note that after performing an action, the AOS observes the feedback from the environment (detection system), i.e., the state vector and the reward value.
Iv-4 Optimization Formulation
We define an optimization problem to acquire the optimal policy , being in state that maximizes the immediate reward in each episode. In particular, in a state expressed as a vector including , , , and , the policy yields an optimal action or a tuple to maximize the immediate reward of the detection system as defined by Equation 3. The action space comprises of possible actions. Then, the optimization problem is formulated as follows:
where (,)) is the immediate reward value associated with policy at time step in an episode.
For solving the optimization problem we apply the -Learning  algorithm which basically is a Reinforcement Learning approach. By that, the AOS is able to perform an optimal selection without requiring prior knowledge about a set of features and the associated AI/ML algorithm. In other words, we aim to find the optimal policy , i.e., a state-action or state-feature set and AI/ML algorithm mapping table to maximize the anomaly detection performance for stealthy DoS attacks. To achieve these aims, the AOS builds a -table based on a -Learning algorithm to store all state-action pair combinations (see Fig. 2 (b)). In a given state at iteration in an episode, the -Learning agent selects an action based on its current selection strategy. Afterwards, it observes the immediate reward and the new state , and updates the -table using a -function. In other words, the -Learning agent can learn from its own decisions at each iteration, and it will converge to the optimal policy after a certain number of iterations .
Let us denote as the expected return of a state under a policy generally, that is formed as follows:
where [0, 1) is a discount factor that indicates the importance of the long-term reward . However, as in our optimization formulation only the immediate reward is considered, is set to 0 in the remaining of the paper. The optimal policy in state represents an action which yields the maximum value :
Hence, for all state-action (,) pairs, the optimal -functions are defined as:
Thus, can be expressed as . By iteratively conducting different actions the optimal value of the -function , i.e., , for all state-action (,) pairs is found. The -function is updated at each iteration using the following equation:
where = , and is the learning rate . The value can be a either constant or dynamically adjusted during the learning operation. In addition, to mitigate the exploration and exploitation dilemma that has direct impact on the convergence rate of any learning algorithms, the epsilon-greedy algorithm  is used. Instead of always taking the best action according to the current state, the -Learning algorithm will then take random actions, and the probability of a random decision is determined by the value of . Accordingly, the learning operation is terminated when all values in the -table converge.
In conclusion, the -learning algorithm yields the optimal policy for a state , i.e., an action , that needs to be taken by the AOS module to maximize the value of the function, i.e., . Algorithm 1 provides details of the -Learning algorithm.
V -MIND Framework
In this section the design and operation of the complete -MIND framework for detecting and mitigating stealthy DoS attacks is outlined.
V-a -MIND Architecture
As depicted in Fig. 3, the -MIND framework comprises the following modules: a Data Preprocessing module; a -Learning agent residing in the AOS component for optimizing the anomaly detection performance; an AI/ML-based anomaly detection system controlled by the -Learning agent; a Knowledge Database for storing information about feasible features and AI/ML algorithms; and an Attack Mitigation Policy Creation module that issues and implements mitigation rules into the data plane to block stealthy DoS attack traffic.
V-B -MIND Operation
The operation of -MIND is described by Algorithm 2. First of all, -MIND runs the -Learning agent to build a -table and then generates the optimum action or combination of feature set and AI/ML algorithm as explained earlier. The labeled data set for the training and cross-validation phase is either obtained from simulation experiments of stealthy DoS attacks in an SDN environment or generated from publicly available data sets . Note that in order to verify the correctness of the anomaly detection, the -Learning agent conducts cross-validation tests after having trained the detection engine. After the initialization (training and cross-validation) part (lines 1-3) is finished, -MIND enters the runtime phase detecting and mitigating stealthy DoS attacks by executing the loop part of Algorithm 2. The initialization part as well as the loop part can be adjusted lateron in case that further suitable features are identified by the Data Preprocessing module.
Vi Performance Evaluation
In this section, we describe the proof-of-concept implementation of the -MIND framework and present some results of our performance analysis.
Vi-a Evaluation Scenario Setup
We perform our experiments by using MaxiNet  to emulate a simple SDN-based network including a Web server and 8 hosts (4 benign and 4 malicious hosts). The Web server and all hosts are implemented in Linux containers and connect to an OpenFlow switch (OpenvSwitch). The SDN network is controlled by an ONOS SDN controller 
. We consider three well-known AI/ML based classifiers including Support Vector Machine (SVM-supervised learning), Random Forest (RF-supervised learning) 
and Self Organizing Maps (SOM-unsupervised learning), hence . The applied feature set is created by the following 10 suitable features: average packets per flow, average packet size per flow, packet change ratio, flow change ratio, average duration per flow, percentage of pair-flows, growth of different ports, average flow inter-arrival-time, fraction of TCP flows over total incoming flows and entropy of incoming flows. These features are extracted for each source IP address taken from a traffic data set including 4000 normal traffic samples and 4000 attack samples. The traffic data set is obtained from our simulation of stealthy DoS attacks in the SDN scenario described above. Accordingly, the -Learning agent is instructed to train the detection engine and to conduct cross-validation tests. It should be noted that an AI/ML algorithm requires at least 2 features, and that the weight values in Equation 3 are set to 0.2 each.
To evaluate the -MIND framework, we first compare our optimized anomaly detection solution based on12], a Generic Algorithm with a SVM classifier (GASVM)  and a Binary Bat Algorithm with a RF classifier (BBARF) . We evaluate the stealthy DoS attack detection performance both in the cross-validation and in the runtime phase. In a second step, we compare the stealthy DoS attack mitigation performance of our -MIND framework applying the optimal policy and a threshold-based SIFT method . For attack mitigation, we delete all flows stemming from source IP addresses that were identified to belong to attackers and then install flow rules to block these malicious sources for a certain period of time, e.g., 30 seconds.
Vi-B Numerical Results and Analysis
Vi-B1 Convergence of the selection algorithm (training and cross-validation phase)
First of all we investigate the stealthy DoS attack detection performance during the training and cross-validation phase of -MIND. As can be seen in Fig. 4 (a), in case of -MIND, the anomaly detection performance (derived from the fitness function in Equation 3) fluctuates during the first 100 iterations of the training and cross-validation phase because the -Learning agent frequently updates its -table in the beginning of the learning phase. Thereafter, it becomes stable and achieves a value of 0.955 for the optimal policy. The other anomaly detection schemes (that operate with fixed classification algorithms and different feature selection techniques) perform well in the first iterations but do not improve anymore in the remaining time. As -MIND is able to apply different combinations of classification algorithms and feature sets due to the -Learning approach, it finally finds the optimum policy that yields the best detection performance. In the considered scenario the optimal action turns out to be a combination of a 4-feature set and the .
Vi-B2 Attack detection performance applying the optimal policy (runtime phase)
In the next step we perform experiments in the MaxiNet emulation framework to evaluate the stealthy DoS attack detection performance of -MIND in the runtime phase. The results are shown in Fig. 4 (b). It can be observed that using the optimal policy, the -MIND framework outperforms the three other methods in all five evaluation criteria and achieves results which are near to the optimal results obtained in the cross-validation phase with the optimal policy. In , it is reported that a stealthy DoS attack where 39.5 unique packets/s are sent to an SDN switch (OpenvSwitch) causes the switch to be overflowed (table-full event) after just 38.0 seconds (Time to DoS). We record the average detection time of the -MIND framework and compare it with the Time to DoS values reported in , as shown in Fig. 4 (c). The results show that the -MIND framework takes much less time to detect stealthy DoS attacks with different attack rates and that it completely avoids the overflow problem in the switch.
Vi-B3 Attack mitigation performance
In order to evaluate the attack mitigation performance we measure the percentage of correctly dropped malicious flow rules in the switch and the request response time of the Web server when the network is under attack. As shown in Fig. 5 (a), -MIND achieves a very good percentage of correctly dropped attack flows because it implements policies as soon as a source IP address is detected to stem from an attacker. Contrary the SIFT method randomly drops flows only after the switch gets overflowed, hence malicious flows always remain in the switch. From Fig. 5 (b) one can see that -MIND also guarantees an acceptable response time and that it again outperforms the SIFT method.
In this paper, we propose a novel machine learning based framework, named -MIND, to effectively defense against stealthy DoS attacks in SDN-based networks. We conduct a comprehensive analysis of the anomaly detection system which incorporates a Reinforcement Learning scheme (-Learning algorithm) to maximize the anomaly detection performance. Our performance evaluation results demonstrate that -MIND applying the optimal policy from the -Learning agent achieves a higher stealthy DoS attack detection and mitigation performance than currently existing methods.
This work has been performed in the framework of the Celtic-Plus project SENDATE Secure-DCI, funded by the German BMBF (ID 16KIS0481).
-  Q. Yan, F. R. Yu, Q. Gong, and J. Li, “Software-defined networking (sdn) and distributed denial of service (ddos) attacks in cloud computing environments: A survey, some research issues, and challenges,” IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 602–622, 2016.
-  E. Cambiaso and et al., “Slow dos attacks: definition and categorisation,” International Journal of Trust Management in Computing and Communications, vol. 1, no. 3-4, pp. 300–319, 2013.
-  B. A. A. Nunes and et al., “A survey of software-defined networking: Past, present, and future of programmable networks,” IEEE Communications Surveys Tutorials, vol. 16, pp. 1617–1634, Third 2014.
-  T. A. Pascoal, Y. G. Dantas, I. E. Fonseca, and V. Nigam, “Slow tcam exhaustion ddos attack,” in IFIP International Conference on ICT Systems Security and Privacy Protection, pp. 17–31, Springer, 2017.
-  J. Cao and et al., “Disrupting sdn via the data plane: a low-rate flow table overflow attack,” in International Conference on Security and Privacy in Communication Systems, pp. 356–376, Springer, 2017.
-  T. Lukaseder, S. Ghosh, and F. Kargl, “Mitigation of flooding and slow ddos attacks in a software-defined network,” arXiv preprint arXiv:1808.05357, 2018.
-  T. Lukaseder and et al., “Sdn-assisted network-based mitigation of slow ddos attacks,” CoRR, vol. abs/1804.06750, 2018.
-  K. Hong and et al., “Sdn-assisted slow http ddos attack defense method,” IEEE Communications Letters, vol. 22, pp. 688–691, April 2018.
-  S. Qiao and et al., “Taming the flow table overflow in openflow switch,” in Proceedings of the 2016 ACM SIGCOMM Conference, SIGCOMM ’16, (New York, NY, USA), pp. 591–592, ACM, 2016.
-  Y. Qian and et al., “Openflow flow table overflow attacks and countermeasures,” in 2016 European Conference on Networks and Communications (EuCNC), pp. 205–209, June 2016.
-  CAIDA, “Description of caida data sets, available at www.caida.org.”
-  A. Silva and et al., “Identification and selection of flow features for accurate traffic classification in sdn,” in IEEE 14th International Symposium on Network Computing and Applications, pp. 134–141, Sep. 2015.
-  R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning. Cambridge, MA, USA: MIT Press, 1st ed., 1998.
-  J. Li, Z. Zhao, R. Li, H. Zhang, and T. Zhang, “Ai-based two-stage intrusion detection for software defined iot networks,” IEEE Internet of Things Journal, pp. 1–1, 2019.
-  T. Kohonen, M. R. Schroeder, and T. S. Huang, eds., Self-Organizing Maps. Berlin, Heidelberg: Springer-Verlag, 3rd ed., 2001.
-  P. Wette, M. Dräxler, and A. Schwabe, “Maxinet: Distributed emulation of software-defined networks,” in 2014 IFIP Networking Conference, pp. 1–9, June 2014.
-  “Description of the onos controller, available at www.onosproject.org.”