Timely Detection and Mitigation of Stealthy DDoS Attacks via IoT Networks

06/15/2020 ∙ by Keval Doshi, et al. ∙ University of Michigan University of South Florida 0

Internet of Things (IoT) networks consist of sensors, actuators, mobile and wearable devices that can connect to the Internet. With billions of such devices already in the market which have significant vulnerabilities, there is a dangerous threat to the Internet services and also some cyber-physical systems that are also connected to the Internet. Specifically, due to their existing vulnerabilities IoT devices are susceptible to being compromised and being part of a new type of stealthy Distributed Denial of Service (DDoS) attack, called Mongolian DDoS, which is characterized by its widely distributed nature and small attack size from each source. This study proposes a novel anomaly-based Intrusion Detection System (IDS) that is capable of timely detecting and mitigating this emerging type of DDoS attacks. The proposed IDS's capability of detecting and mitigating stealthy DDoS attacks with even very low attack size per source is demonstrated through numerical and testbed experiments.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The emergence of Internet of Things (IoT) has been one of the most significant technological advances of the last decade[intro]. With the development of various miniaturized embedded systems and many web services along with cloud computing, it is virtually possible to make any isolated system to communicate with another machine. Moreover, the increased capabilities of new System on Chip (SoC) devices, and a drastic reduction in their sizes have led to an exponential increase in the number of devices that communicate through the Internet. With the number of IoT devices in use today already in a few billions, and with an exponential growth, the amount of data generated and transmitted is also witnessing a proportional increase. This has made the IoT paradigm a prime target for a legion of attackers, hackers, cybercriminals and occasionally governments [intro].

Unfortunately, the security of IoT devices is not able to keep up with the hardware development and now more and more vulnerabilities are detected on a regular basis leading to security threats and privacy concerns [intro]. For example, such compromised devices can be utilized to perform Distributed Denial of Service (DDoS) attacks. DDoS is a type of cyber-attack in which the perpetuator attacks an online service typically by flooding traffic using a large number of sources. Volumetric attacks as the name suggests are characterized by enormous amount of traffic, and they normally do not require a large amount of traffic to be generated by the hackers themselves causing it to be the simplest and most common type of DDoS attack [ddos_types]. In this paper, we consider this type of DDoS attacks, especially the stealthy ones, which are challenging to detect and mitigate due to their widely-distributed nature and low-rate anomalous traffic from each source which can easily bypass traditional filters (i.e., stealth attacks). In stealthy DDoS attacks, such as the recent Mongolian DDoS attacks [nexusguard], although the increase in traffic from each source is small, collectively they are still capable of achieving their goal of disrupting the targeted service due to being widely distributed. Although a number of practical solutions have been deployed against DDoS, many problems still exist [bertino2017botnets], especially due to the new genre of DDoS attacks through IoT devices.

I-a DDoS via IoT

There has been a sharp increase in the number of IoT devices with an estimated number of 8.4B devices in 2017 which is expected to reach 20B by 2020

[Smart_Grid]. According to a study by Gartner, a high percentage of new businesses and systems will include an IoT component by 2020[proliferation]. The convenience provided by IoT technologies has led to a wide-scale deployment of a variety of Internet-connected sensors such as thermostats, security cameras, smart lights among many others. Unfortunately, the rapid spread of IoT also brings about a proliferation of security risks. Even though IoT is evolving at an expeditious pace, it is still very much in its inception stage. Hence, at this stage there is a significant risk that hacked IoT devices can be used for nefarious purposes such as being used as a part of a botnet to launch DDoS attacks [bertino2017botnets].

Currently, IoT network security faces four major challenges:

  • Minimally invasive mitigation: Because of the distributed nature of recent DDoS attacks, it is very difficult to detect the attacking devices. However, the attack should be mitigated with minimal interruption of services to benign users who want to legitimately use the services under attack.

  • High dimensionality: Considering the large number of devices in a typical IoT network, and the abundant data generated by those devices, computationally efficient solutions that can achieve effective network monitoring, i.e., joint monitoring of devices, are required.

  • Unknown attack patterns: Since there is a wide range of vulnerabilities that attackers can exploit, and new attack techniques are continuously developed by attackers, the predictability of attack patterns is quite low compared to the traditional Internet security. Hence, conventional signature-based detection techniques, as well as parametric probabilistic models are not feasible.

  • Timely detection and mitigation: Due to the highly interconnected IoT ecosystem including the Internet and critical infrastructure such as Smart Grid, and the potential disastrous effects of cyberattacks, timely detection and mitigation of attacks is crucial.

We state some of the real-world examples to scrutinize the damage that can be caused by cyberattacks via IoT.

  1. The Mirai botnet, that was launched in 2016, caused one of the most prolific series of attacks in the DDoS history [kolias2017ddos, antonakakis]. This particular botnet infected numerous IoT devices (primarily older routers and IP cameras), and reached data rates higher than 600Gbps. Through flooding the DNS provider Dyn, the Mirai botnet took down many popular websites such as Etsy, GitHub, Netflix, Shopify, SoundCloud, Spotify, and Twitter. Mirai took advantage of devices running out-of-date firmware, and relied on the fact that most users do not change the default usernames/passwords on their devices. The fact that its source code has been released on a hacker forum (and now it is available online [Mirai_Code]) facilitated its derivations.

  2. In November 2016, hackers shut down the heating of two buildings in the city of Lappeenranta, Finland. This was another DDoS attack, and in this case, the attack was specifically targeted towards an attribute of smart home. The attackers managed to cause the heating controllers to continually reboot the system in a loop so that the heaters never worked. The attack was significant because the temperatures are well below freezing at that time of the year, and such a scenario can be life threatening.

  3. In early 2017, Verizon Wireless released a report that included an unnamed university that experienced an attack from more than 5,000 IoT devices, such as vending machines and smart light bulbs.

These examples portray the severity of the situation if such botnets are acquired by sophisticated hackers and employed against critical infrastructures such as Nuclear Plants, Smart Grids, etc.

I-B Related Works

DDoS attacks via IoT networks are relatively less addressed compared to other security issues in the IoT enviornment. However, it is recently attracting considerable interest, e.g., [bertino2017botnets],[kolias2017ddos],[antonakakis]. In [lopez2019network, yusof2019systematic, shtern2014towards, cambiaso2012taxonomy], a wide range of vulnerabilities because of which conventional signature-based detection techniques fail, are discussed. In [Kashi], authors propose a solution to UDP flood attack in an IoT environment using 6LoWPAN and IEEE 802.15.4. However, it has high overheads and complex architectures and components which do not suit an IoT environment [PrevWork]. An agent-based DDoS mitigation approach is proposed in [Sonar]. The authors propose a two-part algorithm in which the attack detection part has been performed in the border router. A similar entropy-based solution is proposed in [Entropy], but with the requirement that the packet contents should be detectable. They also do not consider scenarios in which the entropy does not change but the number of packets do. Xiang et al. [Informetric]

proposes an information metric approach to quantify the differences between legitimate and attack network traffic by assuming that the legitimate traffic follows a Gaussian distribution, whereas the attack traffic follows a Poisson distribution. Machine learning algorithms are also gaining attention as recent anomaly detection research shows promise

[chandola2009anomaly]. Doshi et al. [doshi2018]

presents the performance of popular machine learning algorithms such as SVM, k-nearest-neighbors, neural networks etc. in detecting malicious traffic. However, they require training data for malicious traffic (supervised anomaly detection), and extract features which are specific to certain IoT devices without considering other devices that might be present in the network such as laptops or smartphones. In

[unsuper]

, Nomm et al. proposes using feature selection with popular anomaly detection techniques such as one-class SVM to detect IoT botnet attacks. They use three different approaches for extracting useful features and provides their results over the N-BaIoT dataset. Kurt et al.

[kurt2018] proposes an online anomaly detection algorithm based on dimensionality reduction, that is capable of detecting anomalies in high dimensional settings. They also present their results using the N-BaIoT dataset. Meidan et al. [meidan2018n]

proposes using deep autoencoders for detecting DDoS attacks at the network level. They achieve small false positive rates by training a deep autoencoder for each individual device in the network, which might not scale well to large networks with many devices. They also use a window-based majority voting scheme to detect attacks, which is not well suited for quick detection.

I-C Contributions

In this paper, we propose a practical anomaly-based detection and mitigation technique for IoT-based DDoS attacks, especially the challenging stealthy DDoS attacks with data rate increase per device as low as 10%, which is significantly lower than the considered rates in the literature, and can easily bypass most of the existing approaches. Specifically, the proposed technique is based on a statistical anomaly detection algorithm called Online Discrepancy Test (ODIT) that mitigates the attack with minimal interruption of regular service; scales well to large systems; does not rely on presumed baseline and attack patterns; and achieves quick and accurate detection and mitigation thanks to its sequential nature. The major contributions of this paper are as follows:

  • A novel detection and mitigation technique for stealthy DDoS attacks is proposed, and its time and space complexity is analyzed;

  • Asymptotic optimality of the proposed detector is proven in the minimax sense as the training data size grows;

  • Solution to a dynamic scenario in which the number of devices in the network changes is provided;

  • A comprehensive performance evaluation is provided using a testbed implementation, the N-BaIoT dataset, and simulations.

We first present the problem formulation in Sec. II, then provide the proposed IDS in Sec. III, experimental and testbed results in Sec. IV, discuss the limitations in Sec. V, and finally conclude the paper in Sec. VI.

Ii Problem Formulation and Background

Ii-a System Model

In the considered architecture (Fig. 1), each IoT device sends its data to the node connected to it. Nodes direct the data traffic to a center, such as a web server, data center or utility center. The architecture is scalable in such a way that a node may represent a smart home consisting of tens of devices or a smart building/neighborhood access point consisting of thousands of devices.

Each device typically has different data communication characteristics. In particular, the data content is typically different (for example, a thermostat would have considerably smaller packet size as compared to a CCTV camera), and the communication protocol used might be different (such as TCP, UDP or HTTP). Also, for the same device the data rates might differ significantly based on the location or type of connection, e.g., a laptop connected via a fiber optic cable might send 1000 packets per second whereas the same device would be sending 10 packets per second on a slower connection. Even the active communication frequency varies considerably, e.g., devices like printers update its status once a minute, whereas CCTV cameras send data every second. In this work, our only assumption is that they perform a packet-based data communication.

Fig. 1: System model consisting of IoT devices such as thermostat, CCTV, light bulb, smartphone, etc. In the threat model, the bold arrows imply an increased packet rate.

Ii-B Threat Model

We consider a volumetric DDoS attack scenario in which data rates (packet/sec.) from a number of devices increase at some point in time (see Fig. 1). Particularly, we consider a threat model in which some IoT devices are compromised and start to send more than usual number of data packets. We do not assume further attack specifications such as knowledge on how devices are compromised (e.g., through a vulnerability in the firmware, spoofing attack, man-in-the-middle attack, use of default password), the attack magnitude (i.e., percentage of increase) and duration, and whether the data content changes or not. It is not tractable to mitigate such attacks at the center since accurate identification of all attacking devices is not tractable due to the highly distributed nature of the attacks. Moreover, due to the low-rate nature of the attacks, it is very difficult to detect them locally at the nodes. We propose a general Intrusion Detection System (IDS) that can run locally and is capable of detecting and mitigating an attack even when it is not possible to inspect the data content, which is a prerequisite for many IDS algorithms, e.g., [Entropy], [Limm].

Although the standard volumetric attacks that are studied in the literature include high increase in the data rate of a device, with the increasing number of compromised IoT devices low-rate stealthy DDoS attacks (e.g., with a 20% increase per device) started to become threatening [nexusguard]. Due to the proliferation of IoT, cyber-criminals can launch widely-distributed and highly-effective stealthy DDoS attacks, that can bypass conventional filters and IDSs. Hence, in this paper we study DDoS attacks with data increase rates as low as 20% per device. There are existing works which consider low-rate DDoS attacks (e.g., [Informetric]), nevertheless the considered increase rates are still significantly higher than what we consider in this paper.

As a result of such widely-distributed DDoS attacks (e.g., the almost uniform distribution of attack traffic in Mirai

[incap]), it is not tractable to have a single global solution running at the server end, and thus we propose in the next section a local IDS that runs at each node. Such local solutions also facilitate accurate mitigation.

Iii Proposed Anomaly-Based IDS

In this section, we present our detection and mitigation strategy for the proposed IDS. As shown in Fig. 2

, we detect an attack based on the cooperative test statistic, and once an attack is detected, we monitor each device individually to identify the attacking devices (see Sections

III-B and III-C). We also analyze the computational complexity and a practical scenario in which the number of devices in the network are dynamic in nature.

Fig. 2: Proposed Intrusion Detection and Mitigation System

Iii-a Proposed Detection Strategy

The heterogeneous nature of an IoT network makes parametric anomaly detection approaches for DDoS detection less effective since they assume probabilistic models for nominal and anomalous conditions. In practice, it is difficult to know/estimate the anomalous and even the nominal probability distributions. Hence, parametric anomaly-based IDSs, as well as many conventional signature-based IDSs are not feasible in addressing stealthy DDoS attacks through IoT. Recently, an online and non-parametric detector called the Online Discrepancy Test (ODIT) was proposed for detecting persistent and abrupt anomalies

[ODIT]. Thanks to its nonparametric operation, ODIT does not need to know baseline or anomalous distributions beforehand, hence can address the challenge (C3) stated in Section I-A. ODIT is a sequential method which accumulates evidence in time, and makes a decision at each time based on the accumulated evidence so far, instead of making a hard decision based on a single data point. This sequential nature of ODIT is tailored for timely detection, thus it is able to address the challenge (C4). Moreover, ODIT can handle monitoring large number of devices together (see Sections III-D and IV), which addresses the challenge (C2).

Fig. 3: ODIT procedure with , , , . and are used as in (1) for online anomaly detection (see also Fig. 4). Test points are from the same nominal distribution as training points, which is a two-dimensional Gaussian with independent components, mean, and standard deviation.

In this work, we propose a novel modification for ODIT, and prove that this modified version, as the training data size grows, asymptotically becomes the Cumulative Sum (CUSUM) test, which is the optimum sequential change detection algorithm in the minimax sense. CUSUM is a parametric test which assumes both the nominal and the anomalous distributions are completely known [Bass].

We next show the procedure for the proposed ODIT-based IDS for a node , which observes a

-dimensional normalized data vector

, where is the number of devices, at each time . Here, in the context of DDoS attack detection, denotes the number of packets received from the devices in the network at time and normalized by the corresponding maximum number of packets for each device. Normalization of each dimension of into is performed to deal with the typical heterogeneity in the data communication characteristics of IoT devices.Then, in Section III-B, we show how to achieve cooperation among nodes.

Training: Given an attack-free training dataset which represents the baseline (i.e., nominal) operation, we begin our training procedure.

  1. Randomly split into two subsets and with and points, where , for computational efficiency, as in the bipartite GEM algorithm[Srichanran].

  2. For each point in find the th-nearest-neighbor (NN) distance with respect to the points in .

  3. For a significance level , e.g., , store the th percentile of NN distances to use as a baseline statistic for computing the anomaly evidence of test instances.

The training procedure is illustrated in Fig. 3, where the training set consists of points, which is then randomly split into two sets of points (denoted by green) and points (denoted by blue). In this example, for each point in the first set, we find the second-nearest-neighbor () distance with respect to the second set. The largest NN distance () among the points in the first set is used as the baseline statistic .

Testing: When a new data is observed at time ,

  1. Compute

    (1)

    where is the NN distance of the new data point with respect to the points in , and is obtained in the training.

  2. Treating the statistic as a positive/negative evidence for anomaly accumulate it over time as in CUSUM:

    (2)

    where is given in (1).

  3. Decide to continue taking a new data point if the accumulated evidence is not sufficient for raising an attack alarm, and stop and raise an alarm at the first time , i.e., at time

    (3)

    where is a predetermined threshold.

Fig. 4: ODIT statistic and decision procedure using the setup in Fig. 3 and anomalous test points from uniform distribution over . Anomaly starts at , and detected at with the shown threshold.

As compared to anomaly evidence presented in the original ODIT algorithm [ODIT], we use the form given in Eq. 2. This modification enables an asymptotic optimality proof, which is presented in Theorem 1. The intuition behind this modification is to explicitly show the analogy between the attack evidence and log-likelihood ratio.

Theorem 1.

When the nominal distribution is finite and continuous, and the attack distribution is a uniform distribution whose support includes , as the training set grows, the anomaly evidence converges in probability to the log-likelihood ratio,

(4)

i.e., the proposed ODIT detector converges to CUSUM, which is minimax optimum in minimizing expected detection delay while satisfying a false alarm constraint.

Proof:

See the Appendix. ∎

Since the proposed detector does not train on anomalous data or assume any model for anomaly, the uniform distribution condition on for asymptotic optimality is expected.

Parameter Selection: The detection threshold manifests a trade-off between minimizing the detection delay and minimizing the false alarm rate, as can be seen in Fig. 4. Particularly, smaller threshold facilitates early detection, but also increases the probability of false alarm. In practice, can be chosen to satisfy a given false alarm rate. The number of neighbors also affects the trade-off between early detection and small false alarm rate. Smaller

would result in being more sensitive to anomaly, hence supports earlier detection, but at the same time it causes to be more prone to the false alarms due to nominal outliers. Larger

would result in vice versa. The choice for and

is typically skewed towards

, i.e., , since determines the degree of resemblance between NN distance likelihood under the nominal case, as explained in Theorem 1. The significance level is an intermediate parameter whose effect can be compensated by the threshold . As a rule of thumb, a small value, such as , should be first selected, and then should be set to satisfy a desired false alarm rate.

Remark 1.

A training set that is free of anomaly can be obtained through either human supervision or through an isolated secure system. We should emphasize here the difference between anomaly and outlier. The training set may contain outliers that are generated under no-attack conditions. Outliers correspond to “tail events” that can occur under nominal settings with low probability. Although outliers are rare under normal operations, they can still exist in the training set, and their natural existence does not harm the regular operation of the proposed detector. On the other hand, anomaly is a change in the system behavior, i.e., in the probability distribution of the generated data. In other words, anomaly can be defined as the existence of “persistent outliers”, as opposed to the sporadic nominal outliers.

Iii-B Cooperative Operation

Multiple nodes running the proposed IDS given in Eq. (1)–(3) can cooperate for earlier detection and mitigation of attacks by leveraging the hierarchical structure shown in Fig. 1. Following the cooperative CUSUM with independence assumption in [mei2010efficient] we propose a cooperative detector which sums the local statistics computed at the nodes to obtain a global statistic . That is, at each time , each node updates its local statistic using (2) and transmits it to the center, which sums them to obtain the global detection statistic . Then, the center decides whether or not there is an attack similarly to (3), i.e., raises an alarm at time . This cooperative detector can detect attacks earlier than the single-node ODIT detector thanks to the spatial diversity, i.e., accumulated attack evidences from multiple nodes. Note that the statistics from nodes without any attacked device typically take small values close to zero, but never become negative according to (2). Thus, they do not negatively contribute to the global statistic , and consequently do not cause extra delay in detection.

The cooperative scheme through summing local statistics in a hierarchical architecture enables the proposed detector to easily scale to arbitrarily large networks. The optimal statistical detection would normally require multivariate analysis for all the devices in the entire network. However, due to the abundance of IoT devices this is not feasible; and more importantly due to the natural hierarchical structure in IoT networks such a large-scale multivariate analysis is unnecessary. Specifically, IoT devices are grouped under nodes such as smart home routers, and devices under different nodes can be typically modeled independently under no-attack conditions. Although an attack will normally correlate them, it is reasonable to relax that constraint since a practical attack strikes devices asynchronously, i.e., each of the attacked devices has different attack onset times. This relaxation is critical for CUSUM to be applicable to any multi-dimensional setting since it is not tractable to estimate the set of attacked devices to perform multivariate analysis

[mei2010efficient]. After detection, the mitigation procedure given in the next subsection can be applied at each node.

Iii-C Mitigation Strategy

Timely detection of a DDoS attack is a necessary, but not a sufficient condition for ensuring the security of a system. We need a mitigation strategy that is capable of stopping the DDoS attack by identifying the attacking IoT devices, and then blocking the traffic originating from those devices.

We perform an in-depth analysis to determine which IoT devices are causing the increase. We begin by examining the cooperative statistic calculated in Sec. III-B and determining which nodes are causing the increase. Once the attacking nodes are identified, we examine every dimension, which represent an IoT device connected to the node, of the distance , calculated in Sec. III-A. corresponds to the squared Euclidean norm of the -dimensional distance vector whose th entry is the distance of the data from device at time to the th dimension of th nearest neighbors, i.e., . If comes out large, then it contributes to a large towards an alarm, which provides an evidence that the device is under attack. Hence, after an alarm is raised at time , we

  1. first determine the time instance when the test statistic started to increase since the last time it was zero ( in Fig. 4), which can be seen as an estimate for the attack onset time,

  2. then, compute the average statistic

    (5)

    for each node , and compare it with a threshold to determine the attacking nodes, i.e., node has attacked devices if .

  3. then, for each node identified as attacking, compute the average distance

    (6)

    for each device under it, and compare it with a threshold to decide as attacking or not, i.e., device is identified as attacking if .

As usual the selection of threshold and controls a balance between the False Positive Rate (FPR) and True Positive Rate (TPR). As shown in Fig. 15, the proposed mitigation technique achieves high TPR for almost all FPR even in the challenging attack scenario investigated in Section IV. We should note that the procedure in Eq. (6) requires some memory to store the most recent distance values for all dimensions local to the node performing the procedure.

Combining the detection and mitigation strategies, the proposed IDS technique is summarized in Algorithm 1.

1:  Initialize: ,
2:  for  do
3:     Partition training set into and
4:     Determine
5:  end for
6:  while  do
7:     
8:     Get new data and compute as in (1)
9:     
10:     .
11:  end while
12:  Declare attack at
13:  for  do
14:     Compute as in (5)
15:     if  then
16:        for  do
17:           Compute as in (6)
18:           if  then
19:              Block traffic from device
20:           end if
21:        end for
22:     end if
23:  end for
Algorithm 1 Proposed detection & mitigation algorithm

Iii-D Computational Complexity

The following theorem shows that the proposed algorithm can scale well to large systems.

Theorem 2.

The online time complexity and space (i.e., memory or storage) complexity of Algorithm 1 linearly scales with , the number of points in the second training set, and , the number of devices, i.e., . The offline training time complexity is .

Proof:

See the Appendix. ∎

Remark 2.

There are efficient ways of finding (approximate) nearest neighbors that scale even better to high-dimensional systems. For instance the method proposed in [muja2014scalable] has a time complexity of in online testing where is the maximum number of points to examine for finding nearest neighbors. can be chosen much smaller than at the expense of decreasing the accuracy of NN approximation. Hence, a balance between approximation quality and computational complexity should be sought while choosing . Consequently, using a fast NN method instead of straightforward computation the real-time operation capability and/or the scalability of the proposed method can be significantly enhanced.

Iii-E Dynamic Environments

A major challenge for any anomaly-based intrusion detection system is adaptability to dynamic environments. This means that the system should be adaptable to changes in the environment, while still recognizing abnormal activities. In a dynamic IoT network such as a university or a shopping mall, the number of devices may frequently change based on the number of people, time of the day, day of the year etc. With varying number of devices over time the challenge for the proposed IDS is computing the NN distance under varying number of data dimensions.

A key observation here is that data rates are specific to applications rather than devices. For instance, video streaming has certain data rates regardless of the streaming device. Hence, considering a list of applications that are used in the network, such as web browsing, music and video streaming, we can deal with the changing number of dimensions. Specifically, we first collect training data for the extreme scenario with maximum number of devices running each application simultaneously. During online testing, at each time we modify the training set by ignoring the unused dimensions for each application, and compute .

Since there is a huge number of possible combinations for the number of devices running each application, performing the expensive training procedure (see Theorem 2) for each such combination to compute the baseline statistic is not feasible. Hence, we propose to build a function approximator for . We collect data for several different combinations, and compute for each such combination. Using the number of devices running each application as input we train a regression model to estimate the value for a given combination. The results for Gaussian process regression is shown in Fig. 5

. A simpler method (e.g., linear regression) or a more sophisticated method (e.g., deep neural network) can be used for the regression model. In Fig.

5, we compare the estimated statistic to the computed one for different combinations with increasing number of devices. We see that the estimated statistic closely matches that of exact ODIT, which is infeasible to compute for all combinations. Furthermore, the baseline statistic depends on the selected significance level , which is a design parameter. Our simulations show that a small mismatch between the estimated and computed values is not critical for the algorithm’s performance.

Fig. 5: Comparison of the estimated statistic to the computed one for different combinations with increasing number of devices.

Iv Experimental Results

In this section we evaluate the performance of the proposed IDS using real data, an IoT testbed, and simulations.

Iv-a N-BaIoT Dataset

Dataset Properties Botnet Infections Considered Bashlite Attacks Considered Mirai Attacks
Device
ID
Device Make and Model Device Type Mirai BASHLITE
Combo
Junk
Scan
TCP
UDP
ACK
Scan
SYN
UDP
UDP Plain
1 Danmini Doorbell - - -
2 Ennio Doorbell x - - - - - - -
3 Ecobee Thermostat - - -
4 Philips B120N/10 Baby Monitor - - -
5 Provision PT-737E Security Camera - - -
6 Provision PT-838 Security Camera - - -
7 SimpleHome XCS7-1002-WHT Security Camera - - -
8 SimpleHome XCS7-1003-WHT Security Camera - - -
9 Samsung SNH 1011 N Webcam x - - - - - - -
TABLE I: Overview of the N-BaIoT dataset properties, botnet infections, and considered attack types
Fig. 6: Comparison between deep autoencoder-based IDS and the proposed ODIT-based IDS in terms of false positive rate (top) and average detection delay (bottom). The x-axis corresponds to the index of the attacked device.

We firstly consider the N-BaIoT dataset [meidan2018n] 111This dataset is available at the UCI Machine Learning Repository., which contains data from various IoT devices under both nominal and attack conditions. The network consists of of nine devices, namely a thermostat, a baby monitor, a webcam, two doorbells, and four security cameras connected via WiFi. The dataset description, as well as attacks considered in our experiments are presented in Table I. We do not consider UDP and TCP attacks in our experiments since the provided data does not correlate with typical UDP and TCP attacks.

Our results show that the proposed ODIT-based IDS significantly outperforms the deep autoencoder-based IDS proposed in the N-BaIoT paper [meidan2018n], and by extension Isolation Forest [isolation], SVM [svm] and LOF [lof], which are shown to be outperformed by the autoencoder method. In Fig. 6, we present the false positive rate and the average detection delay when each device is under attack. Here, the average detection delay is analogous to the false negative rate as all the misclassified anomalous instances would contribute to the average detection delay. The proposed ODIT-baed method achieves much more accurate and quicker detection than the autoencoder-based method except for device 8. On further analyzing the data, we see that there are a few outliers in the data for device 8, which causes some false alarms. Note that by increasing the decision threshold , we are able to reduce the false positive rate at the cost of a slightly higher detection delay. Conversely, a smaller detection delay can be achieved by setting a lower threshold at the cost of a few more false alarms. Thanks to its sequential nature, the ODIT-based IDS detects the attacks right after it occurs while satisfying very small false alarm rates. Whereas, the autoencoder-based IDS applies a majority voting in a moving window for attack detection, thus its detection delay is at least half the window size. The optimum window sizes reported in [meidan2018n] for each device are used for comparisons.

Iv-B Testbed Results

Fig. 7: IoT testbed consisting of NodeMCUs, smart switches, security camera, Amazon Echo Show, laptop, tablet, and Raspberry Pi.
Fig. 8: Setup for hardware implementation.
Fig. 9: Time series for number of packets sent by a computer (top), a NodeMCU (middle), and an Amazon Echo Show (bottom).

We next present a hardware implementation of our proposed attack detection and mitigation algorithm. Even though there are several dataset already available, none of them addresses stealthy DDoS attacks. In existing works, even a 300% increase from the nominal data rate of a device is considered low-rate DDoS, e.g., [Informetric]. However, due to the exponential increase in the number of IoT devices, much lower data rates, e.g., a 30% increase, might be sufficient for performing effective DDoS attacks, as exemplified by the recent Mongolian DDoS attacks [nexusguard]. Also, most of the datasets seem to concentrate only on vulnerable devices, but in a typical network, there are also devices that cannot be compromised easily, and account for a significant amount of background data. We first present the testbed setup, and then provide the experimental results using the testbed data.

Testbed Setup: To demonstrate a typical IoT network, we collected the network traffic data from devices that were connected via Wi-Fi to an access point, which is wire connected to a router. For sniffing the network traffic, we performed port mirroring on the router, and recorded the data using Wireshark. The goal here is to design a setup that is as close to a real life scenario as possible for studying stealthy DDoS attacks. We consider a network consisting of 15 popular IoT devices, namely a laptop computer, a tablet, 7 NodeMCUs, 4 smart switches, an Amazon Echo Show device, and a security camera as shown in Fig. 7. The purpose of using a computer and a tablet is to consider devices which cannot be easily compromised yet they account for a significant amount of traffic passing through the router. The NodeMCUs, which may represent various IoT devices on the market, are configured to update their status on a local server. A Raspberry Pi acts as a command and control (C&C) server which is used to start and stop attacks. In Fig. 8, we show the setup for our hardware implementation. A comparison between the number of packets transmitted from a computer, a NodeMCU, and an Amazon Echo Show over a period of time is shown in Fig. 9. It is seen that each device has two major operating states: an active state and an idle state. However, there is a stark difference between the transmission patterns for each device.

Fig. 10: Time series for total number of packets sent over the entire network without (top) and with the stealthy Http flooding attack (bottom).
Fig. 11: Live implementation of the proposed IDS in the stealthy Http flooding attack case. At attack starts by increasing the data rate of device 1 by 10% (third from top) and device 2 by 30% (bottom). In such a stealthy attack, there is no visible change in the total number of packets in the network (second from top). The detection statistic of the proposed IDS steadily increases right after the attack, and alarms when it crosses the threshold (top).
Attack Name Attack Type Magnitude CompromisedDevice
Http Flooding Application Low-Rate Node MCU
ICMP Flooding Volumetric Low-Rate Node MCU
Ping of Death Protocol Low-Rate Laptop
UDP Flooding Volumetric High-Rate Laptop
TABLE II: Testbed Attack Characteristics
Fig. 12: Comparison of the proposed ODIT-based IDS and the information metric-based IDS in [Informetric] for attacks presented in Table II.

Attack Models & Results: We implemented 4 different attacks, as shown in Table II. The data was captured in pcap format by using Wireshark and is publicly available 222https://github.com/kevaldoshi17/DDoSAttackDetection. In Http flooding, at sec., we slightly increase the mean rates of updating the server for two of the NodeMCUs. Particularly, there is small increase in the number of Get requests from two NodeMCUs. To depict that the attack magnitude does not have to be consistent across all devices, we increase the mean packet rate of NodeMCU 1 by 10% and of NodeMCU 2 by 30%. In Fig. 10, we show a comparison between the total number of packets sent over the entire network with and without the stealthy Http attack. It is seen that due to the low-rate nature of the attack, there is no visible increase in the total number of packets. Hence, filter-based methods which monitor the total number of packets transmitted in the network, would fail to detect such a stealthy attack. In all cases, the input to the proposed IDS is the number of each packet type from each individual device.

In Fig. 11, we see that from to , the ODIT statistic under the Http attack does not increase considerably, but after , it steadily increases. This figure is taken from a real-time demonstration which is available online 333https://youtu.be/zQexZgB5AMs. By adjusting the threshold we can have a trade-off between the detection delay and number of false alarms. In Fig. 12, we compare the proposed ODIT-based IDS with the information metric-based algorithm proposed in [Informetric] in terms of average detection delay under all attack cases. The method in [Informetric] uses an information distance metric based on the generalized (Rényi) entropy. It uses a window to compute the information metric on the aggregate traffic at each node, which causes loss in time resolution, and also in early detection ability. Since ODIT monitors each packet type from every individual device, it is able to detect the attack with a much smaller detection delay for the same false alarm rates.

Similarly in the other three attack cases, ODIT quickly and accurately detects the attacks by closely monitoring the data traffic in each type from each device thanks to its multivariate nature. It takes much longer for the information metric IDS to detect the attacks at the same false alarm rate as there is no significant increase in the number of total number of packets. Although in ICMP flooding, the data rates of the two NodeMCUs are again increased 10% and 30%, this time it is easier for the information metric method to detect since the number of ICMP packets in the network is much smaller than the number of Http packets. In the case of ping of death attack, which results in an increase in the number of ICMP packets, the proposed IDS achieves zero detection delay in all trials. Finally, in the UDP flood attack, we considered a higher attack rate by increasing the nominal data rate of the laptop by 100%. In this case, the performance of information metric method improves, but ODIT still outperforms it by detecting the attack under 0.2 second on average for a false alarm rate of 0.01.

Iv-C Simulation Results

Data Model
                  IoT Devices Active State Probability Mean Packet Rate Idle     State Probability Mean Packet Rate
Thermostat 0.25 25 0.75 5
Smart Light 0.05 10 0.95 5
Security Camera 1 80 0 0
Smart Printer 0.05 75 0.95 5
Smart TV 0.3 120 0.7 10
TABLE III: Nominal data model for different IoT devices.

We finally present simulation results to evaluate the performance of the proposed IDS in a large network with many nodes, where a stealthy DDoS attack from many compromised IoT devices can actually take down a server.

Simulation Setup: The simulation setup consists of nodes each of which monitors devices (Fig. 1). The IoT devices considered here are those that are most likely to be compromised or devices that are present in every smart home. The devices are assumed to have two states of operation, idle state and active state. The assumed probabilities of the devices being in idle or active state are given by Table III. We inject the attack traffic of different rates into the simulated dataset, and apply the proposed detection and mitigation scheme to diagnose these attacks. We perform this repeatedly for different kinds of scenarios in which various combinations of devices get attacked to compute the Average Detection Delay vs. False Alarm Rate (i.e., false alarm probability) performance.

In each node, we assume data from each IoT device is independent and identically distributed (iid) following the pattern given in Table III

. The probabilities listed are based on heuristics and standard day-to-day usage of the mentioned devices. For example, a smart TV is considered to be used approximately 7 to 8 hours in a day, so its active state probability is given as 0.3. With the shown probabilities, devices may or may not switch state after a session. We consider the following session durations:

sec. for thermostat, sec. for smart light, sec. for smart printer, sec. for smart TV, and “always on” for security camera. The mean packet rates are determined by considering the amount of data that is transmitted per second and the average packet size. For each device, the number of packets are generated using the mixture of two Gaussian distributions defined by the active state probability, mean packet rates and a common standard deviation, chosen as . To obtain the number of packets, the generated real-valued numbers are rounded to the nearest nonnegative integer. For the purpose of simulations, the training data consists of 40 hours of attack-free data from 100 different devices in each node.

Attack Model: Here we consider a practical scenario in which the IoT devices could be under attack, but the node is assumed to be secure. To parameterize the attack size, we consider 10% of the devices to be compromised. The attacked devices are randomly selected to assume a general model. During the attack phase, the data rates of the selected devices are increased slightly by 10%. To demonstrate the effectiveness of such a stealthy attack, we plotted in Fig. 13 the total number of packets received by the server when attacked from 100,000 devices with 10% increase in their data rates.

Fig. 13: Impact of stealthy DDoS attack on the server. The attack consists of 100,000 devices with 10% increase in their data rates.
Fig. 14: Average detection delay vs. False positive rate for the proposed cooperative ODIT-based IDS, the IDS based on cooperative CUSUM [mei2010efficient], and information metric-based IDS [Informetric]. The network consists of 10 nodes each of which has 100 devices connected to it. 10% of the devices attack with 10% increase in data rate with respect to their nominal rates.
Fig. 15: Mitigation performance of ODIT vs. Data filtering method which applies a threshold on the data rates. After detecting an attack, the proposed IDS successfully identifies attacked devices, and blocks their traffic.

Comparisons: We compare our proposed model with an IDS based on cooperative CUSUM [mei2010efficient], which knows the exact parameters of the nominal model and the anomalous model. CUSUM knows exactly the mean and standard deviation of the Gaussian distribution, as well as the probability of being active for each device. Note that due to rounding to the nearest nonnegative integer value, the real probability distribution of number of packets deviates from the generative bimodal Gaussian. Hence, the proposed ODIT detector even sometimes outperforms CUSUM, which exactly knows the generative Gaussian model. The results for Average Detection Delay vs False Positive Rate are shown in Fig. 14. We see that the cooperative ODIT-based IDS, proposed in Section III-B, performs better than the clairvoyant CUSUM detector, which exactly knows the generative probabilistic model, for false alarm rates less than . It significantly outperforms the information metric method proposed in [Informetric], which monitors the aggregate traffic at each node. In Fig. 14, it is seen that the cooperation among nodes facilitates earlier detection by our algorithm (ODIT vs. Cooperative ODIT). Through the proposed computationally efficient cooperation scheme, given in Section III-B, the ODIT-based IDS is able to handle large networks with thousands of devices. This result can be easily extended to even larger networks with millions of devices. Finally, in Fig. 15, we evaluate the mitigation performance of the proposed method (see Section III-C). Since the method in [Informetric] monitors the aggregate traffic at nodes, it is not straightforward for it to detect the attacking devices. Thus, to evaluate our mitigation performance, we consider the data filtering method, which simply applies a threshold to the observed raw data. The reported Area Under the Curve (AUC) values in Fig. 15 illustrate the successful mitigation performance of the proposed method under a challenging stealthy attack scenario.

V Limitations and Future Work

In this work, we proposed a novel intrusion detection system which is capable of quickly and accurately detecting and mitigating a broad set of IoT-empowered attacks, in particular stealthy low-rate DDoS attacks. However, there are still some limitations which need to be addressed to make the system more robust to attacks in the future. First, it is assumed that the nominal behavior of the devices does not change over time, so the IDS needs to be trained only once. However, in a real system implementation the IDS needs to be updated periodically. Secondly, feature extraction plays an important role as number of packets or packet size might not always exactly represent the characteristics of a real network. For future work, we plan to investigate other aspects of dynamic networks such as continual learning under changing nominal network traffic.

Vi Conclusion

With the proliferation of IoT devices, and the ease of triggering DoS attacks even by unsophisticated malicious parties, there is an increasing need for developing solutions to DDoS via IoT, especially the recent stealthy DDoS attacks. In this context, we presented a general and emerging threat model for hierarchical IoT networks. We then introduced a novel intrusion detection and mitigation framework that employs an online, scalable and nonparametric anomaly detection algorithm. Through real and simulated data, as well as an IoT testbed we evaluated the performance of proposed detection and mitigation scheme under challenging stealthy DDoS attack scenarios. Applications of the proposed scheme to large and dynamic networks with varying number of devices were also considered.

References

Proof of Theorem 1

Consider a hypersphere centered at with radius , the NN distance of with respect to the training set . The maximum likelihood estimate for the probability of a point being inside under is given by . It is known that, as the total number of points grow, this binomial probability estimate converges to the true probability mass in in the mean square sense [agresti2018introduction], i.e.,

as . Hence, the probability density estimate

where is the volume of

, converges to the actual probability density function,

as , since shrinks and . Similarly, considering a hypersphere around which includes points within its radius , we see that as , and

Assuming a uniform distribution

we conclude with

as .

Proof of Theorem 2

In online testing (see lines 6-11), the most expensive part is to compute , in particular . And within the expensive part is to find the th nearest neighbor, which is if computed straightforwardly by computing the distance of test point to all training points. The space complexity of the algorithm is due to storing training points, each of which is -dimensional, i.e., . Note that the both time and space complexity of the mitigation part shown in lines 13-23 is where is a bounded number close to the detection delay, typically much smaller than . In training, to compute shown in line 4, th nearest neighbor among points are computed for each of points, requiring computations. However, training is performed once offline, so the complexity of online testing is usually critical for scalability.