DÏoT: A Crowdsourced Self-learning Approach for Detecting Compromised IoT Devices

04/20/2018 ∙ by Thien Duc Nguyen, et al. ∙ aalto Association for Computing Machinery Technische Universität Darmstadt 0

IoT devices are being widely deployed. Many of them are vulnerable due to insecure implementations and configuration. As a result, many networks already have vulnerable devices that are easy to compromise. This has led to a new category of malware specifically targeting IoT devices. Existing intrusion detection techniques are not effective in detecting compromised IoT devices given the massive scale of the problem in terms of the number of different manufacturers involved. In this paper, we present DÏoT, a system for detecting compromised IoT devices effectively. In contrast to prior work, DÏoT uses a novel self-learning approach to classify devices into device types and build for each of these normal communication profiles that can subsequently be used to detect anomalous deviations in communication patterns. DÏoT is completely autonomous and can be trained in a distributed crowdsourced manner without requiring human intervention or labeled training data. Consequently, DÏoT copes with the emergence of new device types as well as new attacks. By systematic experiments using more than 30 real-world IoT devices, we show that DÏoT is effective (96 alarms) and fast (<0.03 s.) at detecting devices compromised by the infamous Mirai malware.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The growing popularity of the Internet-of-Things (IoT) has led to many new device manufacturers entering the IoT device market, bringing out products at an ever-increasing pace. This “rush-to-market” mentality of some manufacturers has led to poor product design practices in which security considerations often remain merely an afterthought. As a consequence, many devices are released with inherent security vulnerabilities that can be exploited through various attacks. An entirely new category of malware has emerged explicitly targeting IoT devices, as these are increasingly popular and relatively easy to compromise [1, 2, 3, 4].

The preferred way to cope with security vulnerabilities would be to apply security patches through software and firmware updates on affected devices [5]. However, many devices lack appropriate facilities for automated updates or there may be significant delays until device manufacturers provide them, mandating the use of reactive security measures like intrusion detection systems (IDS) for detecting possible device compromise  [6, 7, 8, 9]. Signature-based IDSs look for specific communication patterns, so-called attack signatures, associated with known attacks. Such systems are, however, unable to detect novel attacks for which they do not yet have signatures, leaving the network unprotected until the IDS vendor releases updated attack signatures [6].

To be able to detect also previously unknown attacks one has therefore to resort to anomaly detection in which the normal behavior of devices is profiled. Potential attacks are detected as deviations from this normal behavior profile [7, 8, 9]. However, this approach often suffers from a high false alarm rate, making it unusable in practice. This problem is exacerbated in the IoT setting: First, there are hundreds of very heterogeneous devices on the market, which makes it even more challenging to train precise models covering all variations of behaviors exhibited by various IoT devices. Second, when looking at IoT devices individually they do typically not (notwithstanding a few exceptions) generate a lot of network traffic, as their communications are limited to, e.g., status updates about sensor readings or (relatively) infrequent interactions related to user commands. This scarcity of communications makes it in itself challenging to train comprehensive models that can accurately cover the full behavior of IoT devices.

To be effective, an anomaly detection model would require capturing all benign patterns of behavior in order to differentiate benign behavior from malicious. Given the ever-increasing number of literally thousands of types of IoT devices (ranging from temperature sensors and smart light bulbs to big appliances like washing machines) and the typical scarcity of their communications, an all-encompassing behavior model would be 1) tedious to learn and update, and 2) too broad to be effective at detecting subtle anomalies without generating many false alarms.

Goals and Contributions. To tackle the challenges that effective intrusion detection in IoT networks is facing, we present DÏoT, a system for detecting compromised IoT devices that is effective without suffering from the deficiencies discussed above. We propose a novel approach that combines automated device-type identification and subsequent device-type-specific anomaly detection to achieve accurate detection of attacks while generating almost no false alarms. Major IoT device vendors, including CISCO, assisted us formulating real-world settings for our solution and usage scenarios.

We make the following contributions:

  • [noitemsep,topsep=0pt]

  • DÏoT, a self-learning distributed system for security monitoring of IoT devices (Sect. III) based on device-type-specific detection models for detecting anomalous device behavior:

    • It uses a novel self-learning identification method based on passive fingerprinting of periodic communication traffic of IoT devices (Sect. IV). In contrast to previous methods, it requires no prior knowledge about device types nor labeled training data and is effective at identifying the type of an IoT device in any state of a device’s operation (achieves accuracy Sect. VIII).

    • It utilizes a novel anomaly detection approach based on representing network packets as symbols in a language allowing to use a language analysis technique to detect anomalies (Sect. V). It is fast (detection in ) and effective ( true positive rate, zero false alarms) at detecting IoT devices infected with real IoT malware (Mirai [1]) (Sect. IX).

    • It is the first system to apply a federated learning approach for anomaly-detection-based intrusion detection, aggregating behavior profiles efficiently (Sect. VI).

Ii Preliminaries

Ii-a IoT Malware

Recently, a number of large-scale attacks utilizing vulnerabilities in IoT devices have been widely reported in news reports. The Mirai malware [1] is the most well-known, which targeted specifically IoT devices having basic security flaws [10, 3]. Subsequently other similar malware attacks like Persirai [11], Hajime [2] and BrickerBot [4] have emerged. Pa et al. [12] identified that IoT malware attacks can be divided into three stages: intrusion, infection and monetization. The intrusion stage utilizes weaknesses like default administrator or root passwords, or exploits of known vulnerabilities in particular IoT devices to gain unauthorized access to devices. In the infection stage, attackers upload a piece of malicious code to the device and execute it. In the monetization phase the malware typically performs network scans for identifying other vulnerable devices causing also these devices to get infected. Finally, the malware takes malicious actions like acting as part of a bot network for distributed denial of service (DDoS) attacks (e.g., in the case of Mirai [3]). Other monetization methods may include unauthorized leakage of information from the user’s network to outsiders.

Ii-B Device-Type Identification

Earlier device-type identification schemes have the primary goal of using various device fingerprinting approaches for identifying either the device model [13] or the specific hardware / software configuration of a device [14, 15, 16, 17] by training classification models with labeled data from specific known device types. Such training data requires extensive human effort to generate and maintain.

DÏoT takes a different approach: the purpose of identification in DÏoT is to enable efficient anomaly detection. Hence, there is no need to identify the real-world model of each device. It is sufficient to reliably map devices to a “device type” for which the system can build a model of normal behavior that can be used to effectively detect anomalous deviations. Therefore DÏoT can be trained without the need to manually label the communication traces of pre-defined real-world device types. Rather, a clustering algorithm is used to identify abstract device types (cf. Sect. IV) to which devices are mapped based on their observed communication behavior. The training and evaluation of the anomaly detection models (cf. Sect. V) are performed in terms of these abstract device types. This allows DÏoT to be trained and operated autonomously, without the need for human intervention at any stage.

Iii System Model

Our system model is shown in Fig. 1. We consider a typical SOHO (Small Office and Home) network, where IoT devices connect to the Internet via an access gateway.

Iii-a Adversary Model and Assumptions

Adversary. The adversary is IoT malware performing attacks against, or launching attacks from, vulnerable devices in the SOHO network. Hereby we consider all actions undertaken by the malware that it performs to discover, infect and exploit vulnerable devices as discussed in detail in Sect. VII-A3.

Defense goals. The primary goal of DÏoT is to detect attacks on IoT devices in order to take appropriate countermeasures, e.g., by preventing targeted devices from being compromised or isolating compromised devices from the rest of the network. We aim to detect attacks at the earliest stage possible, preferably even before a device can be successfully infected.

In addition, we make following assumptions:

  • [noitemsep,topsep=0pt]

  • A1 - No malicious manufacturers. IoT devices may be vulnerable but are not compromised when first released by a manufacturer. Adversaries must first find a vulnerability and a way to exploit it, which takes some time during which non-compromised devices generate only legitimate communications, leaving sufficient time (cf. Sect. IX-C) to learn benign models of device behavior.

  • A2 - Security Gateway is not compromised. Since Security Gateway is the device enforcing security in the SOHO network, we assume that it is not compromised. Like firewall devices or antivirus software, if Security Gateway is compromised the SOHO network stops being protected by it. Several approaches can be used to protect it. For instance, if Security Gateway supports a suitable trusted execution environment, like Intel SGX [18] or trusted platform module, its integrity can be remotely verified using remote attestation techniques [19].

Iii-B Challenges

Anomaly detection techniques face challenges in the IoT application scenario:

  • [noitemsep,topsep=0pt]

  • C1- Dynamic threat landscape. New IoT devices are released on a daily basis. A significant fraction of them have security vulnerabilities. Exploits targeting vulnerable devices are also being developed by adversaries at a similarly high pace. This makes the threats against IoT devices highly dynamic and ever-increasing.

  • C2- Resource limitations. IoT devices have limited capabilities w.r.t. available memory, computing resources and energy often making it infeasible to perform on-device detection.

  • C3- IoT device heterogeneity and false alarms. Behaviors of different IoT devices are very heterogeneous, so that anomaly detection techniques easily raise false alarms. However, to be useful in practice, anomaly detection systems must minimize false alarms.

  • C4- Scarcity of communications. In contrast to high-end devices, IoT devices generate only little traffic, often triggered by infrequent user interactions.

Fig. 1: DÏoT system model

Iii-C System Design

Iii-C1 Design Choices

Gateway monitoring: We detect compromised IoT devices by monitoring their communication as observed by Security Gateway that acts as a gateway for the local network. All IoT devices are directly or indirectly connected to this gateway, which observes all external communications to the Internet as well as most local device-to-device communications. It represents an extensive and unconstrained monitoring point, effectively addressing challenge C2.

Device-type-specific anomaly detection: Since IoT devices have heterogeneous behavior (challenge C3), we model each device-type’s legitimate behavior with a dedicated model. Consequently, each anomaly detection model captures a relatively homogeneous and limited behavior, representing a single device type. This approach leads to a restrained model that is able to capture all possible legitimate behaviors of a device type. Thus the model is expected to be more sensitive to subtle anomalies, increasing its detection capability, and less prone to trigger false alarms. This design choice partially addresses challenge C3.

Autonomous self-learning system: DÏoT learns anomaly detection profiles using data samples that only have labels telling which device types generated them. These labels representing individual device types are automatically generated and assigned. This is done by building fingerprints for the communication patterns of each IoT device. DÏoT

uses an unsupervised machine learning approach to cluster these fingerprints and autonomously create a label for each cluster representing a device type (device-type identification in Sect. 

IV). The whole process does not require any human intervention, which allows DÏoT to respond quickly and autonomously to new threats, addressing challenge C1. It is worth noting that DÏoT starts operating with no device-type identification or anomaly detection model. It learns and improves these models as Security Gateways aggregate more data.

Information aggregation: Information gathered by different Security Gateways is correlated in a central entity, the IoT Security Service. Anomaly detection models are learned using a federated learning approach where Security Gateways use locally collected data and collaborate with IoT Security Service to train the models (details in Sect. VI). IoT Security Service similarly uses fingerprints generated at several Security Gateways to learn device-type identification models (details in Sect. IV-C). This aggregation maximizes the usage of limited information obtained from scarce communications at each gateway (challenge C4). It is also expected to improve the accuracy of anomaly detection models (challenge C3) by learning from the maximum amount of data available.

Modeling techniques requiring little data: As later presented in Sect. IV and V, we define features and select machine learning algorithms that work with few training data for both device-type identification and anomaly detection. This design choice addresses challenge C4.

Iii-C2 System architecture

The DÏoT system consists of Security Gateway and IoT Security Service. The role of Security Gateway is to monitor devices and perform device fingerprinting and anomaly detection in order to identify compromised devices in the network. It is supported by IoT Security Service that performs device-type identification based on the fingerprints provided and aggregates device-type-specific anomaly detection models used by Security Gateway.

Security Gateway acts as the local access gateway to the Internet to which IoT devices connect over WiFi or an Ethernet connection. Apart from acting as a gateway router for connected devices in the local network, Security Gateway hosts two functions: the Device Fingerprinting and Anomaly Detection components. The task of Device Fingerprinting is to monitor the communication patterns of connected IoT devices and extract device fingerprints for identifying the device type of the connected device (details in Sect. IV). Device Fingerprinting is a one-time operation that is performed when a new IoT device is detected in the network. An identified device is assigned to its corresponding type which it retains permanently unless fingerprinting is re-initiated, e.g., due to a detected firmware update on the device (cf. Sect. X). The Anomaly Detection component continuously monitors the communications of identified IoT devices and detects devices displaying abnormal communication behavior that is potentially caused by malware (details in Sect. V). Security Gateway also provides locally collected data to IoT Security Service for learning device-type identification and anomaly detection models.

IoT Security Service supports Security Gateway. It is a cloud-based functionality hosting two main components: Device-Type Identification and Anomaly Detection Model. Device-Type Identification uses a machine learning-based classifier for identifying the device type of IoT devices based on device fingerprints provided by Security Gateway. Anomaly Detection Model maintains a repository of device-type-specific anomaly detection models. After successful identification of a device’s type, IoT Security Service sends the identified device type and corresponding anomaly detection model to Security Gateway. Upon receiving the anomaly detection model for the type of an identified IoT device, Security Gateway starts monitoring its communications in order to detect potential deviations from normal behavior encoded by the detection model.

Iv Device-Type Identification

Traditional device-type identification approaches [20, 21] rely on aggregated statistics extracted from dense network traffic. These are ineffective when applied to IoT devices due to the scarcity of their communication (cf. Sect. III-B - challenge C4). IoT devices generate little dense traffic, typically only during rare and short user interactions. Nevertheless, IoT devices also generate background communication independent of user interactions. This traffic is always present, relatively constant and periodic.

Thus, we introduce a novel technique for identifying the type of IoT devices based on their periodic background network traffic. In contrast to existing approaches, this technique can identify the type of an IoT device in any state of a device’s operation, including standby, with a constant time of 30 minutes. Our technique is composed of three steps relying on passive monitoring of the network traffic at the network gateway: step 1: inference of periodic flows, their period and stability (Sect. IV-A), step 2: extraction of a fingerprint characterizing a device’s type based on its periodic flows (Sect. IV-B) and step 3: use of this fingerprint in a classification system that identifies device-types (Sect. IV-C). The overview of the identification process is depicted in Fig. 2. Steps 1 and 2 are implemented in Security Gateway while step 3 is implemented in IoT Security Service.

Fig. 2: Overview of device-type identification.

Iv-a Periodic Flow Inference

The first step in our device-type identification technique is to infer the periodicity in the communication of a device. Fourier transform and signal autocorrelation are effective signal processing techniques for inferring periodicity. We divide the network traffic of a device into distinct flows and apply these techniques to the flows.

While Fourier transform and signal autocorrelation can identify the several distinct periods of a signal, ignoring most non-periodic noise, these techniques are more accurate when applied to pure single-periodic signals. As a result, we pre-process the network traffic received at Security Gateway and divide it into distinct flows. We define a flow as a sequence of network packets sent from a given source MAC address (IoT device) using a given communication protocol (e.g., NTP, ARP, RTSP, etc.). The rationale for flow division is that most periodic communication uses dedicated protocols that are different from the ones used for communication related to user interaction (non-periodic). If periodic and non-periodic communication still coexist in a flow (e.g., HTTP), Fourier Transform and signal autocorrelation can cope better with this reduced non-periodic noise.

The flow of packets in a network capture must be converted into a format suitable for signal processing. We discretize each flow into a binary time series sampled at one value per second, indicating whether the flow contained one or more packets during the 1-second period (value 1) or not (value 0). The computed time series is a discrete binary signal of duration seconds.

We first use the discrete Fourier transform (DFT) [22] to identify candidate periods for a given flow. DFT converts a discrete signal from the time domain to the frequency domain: . provides amplitude values for each frequency . The frequency resulting in the largest amplitude gives the periodicity of the dominant period in . Secondary periods of lower amplitude also exist. We select candidate periods having an amplitude larger than 10% of the maximum amplitude . We discard close candidate periods by selecting only local maxima of . is considered a local maximum on if . The result of this operation is a list of candidate periods for a flow.

Candidate periods found using DFT can be nonexistent or inaccurate. To confirm and refine these periods, we compute the discrete autocorrelation of . denotes the similarity of the signal with itself as a function of different time offsets. If at offset is large and reaches a local maximum, it means that is likely periodic, with period and that this period occurs times over . For each candidate period obtained with DFT, we confirm and refine it by analyzing the value of on the range of close offsets . If it contains a local maximum , we confirm the existence of a period that belongs to this range and update its value to . is considered a local maximum on if . For each resulting period we compute characteristic metrics and , defined as:

(1)
(2)

computes the ratio of occurrences of period over signal of duration seconds. An accurate and stable periodic signal of period renders . However, a periodic signal may be noisy () or have parallel periods with the same periodicity. Periodic signals may also be unstable exhibiting slight differences in their periodicity (). This is the rationale for computing where we sum the occurrences of neighboring periods , and . A stable signal of period produces , while unstable signals produce and .

The final result of period inference for a flow is a set of periods with the corresponding ratios and : .

Example: Figure 3 shows the plot of binary time series extracted from flows of a D-LinkCam DCS935L IP camera. We see that all depicted flows are periodic. Applying DFT and autocorrelation on these time series provides the following results:

ARP: (period:55, r:0.735, rn:0.857)
HTTPS: (period:55, r:0.857, rn:1.102)
mDNS: (period:25, r:2.171, rn:4.399)
port 62976: (period:30, r:0.969, rn:0.969)

We see that our method is able to accurately infer all periods observed in Fig. 3. The flow on TCP port 62976 has the most stable period (30s.) as highlighted by the values . ARP and HTTPS (port 443) have both a less stable period of 55s., as highlighted by lower values and a larger difference between and . We also inferred the 25s. period of the mDNS flow (port 5353). But as we can observe in Fig. 3, there are three different signals having a 25s. periodicity on this flow. This aspect is captured in period inference by rendering high and values, i.e. far larger than . These results show that our method detects periodic flows, accurately infers their period and characterizes them with and metrics.

Fig. 3: Four binary time series extracted from periodic flows of a D-LinkCam DCS935L IP camera. The flows correspond to ARP protocol, HTTPS (port 443), mDNS (port 5353) and TCP port 62976. All flows are periodic.

Iv-B Fingerprint Extraction

We build a fingerprint for a device type by extracting features from its periodic flows. These features are later used with an unsupervised machine learning algorithm that creates and assigns device-type labels to fingerprints (Sect. IV-C).

We split a network traffic capture of seconds into three sub-captures . We apply periodic flow inference (Sect. IV-A) on each sub-capture and on the whole capture . We obtain four sets of periods with the metrics and for each flow. The goal of applying period inference on smaller sub-captures is twofold. First, we obtain more significant results by discarding periods that are inferred from less than two sub-captures. Second, we can compute statistics from metrics and to measure their stability.

The results from period inference are grouped by source MAC address, linked to a single device. This grouping defines the granularity of feature extraction, i.e., one fingerprint is extracted per source MAC address and capture. We introduce 33 features that compose our device-type fingerprint. These features are manually designed to model a group of periodic flows in a unique manner that enables to distinguish device types. It is worth noting that all our features are computed from the statistics obtained during periodic flow inference (Sect. 

IV-A). They do not use packet payload information nor packet header information from protocols above the transport layer. Consequently, DÏoT can operate on any traffic encrypted above the transport layer. There are four categories of features as discussed below and in Tab. I.

Periodic flows (9 features). This feature category characterizes the quantity and quality of periodic flows. It includes the count of periodic flows (1), the layer of protocols that support periodic flows (2), if flows are single- or multi-periodic (3-6), if there is a change in the source port of periodic flows (7) and the frequency of this change (8-9).

Period accuracy (3 features).

These features measure the accuracy of the inferred periods and characterises how noisy the flows they were extracted from are. They consist of the count of periods that were inferred from all sub-captures and the whole capture (10), the mean (11) and standard deviation (12) for the count of sub-captures from which each period was inferred.

Period duration (4 features). These features (13-16) represent the number of periods that belong to four duration ranges, e.g., . The ranges were manually chosen in an attempt to segregate periods according to their relative duration: . Periods of less than 5 seconds or more than 10 minutes are discarded. Identifying long periods requires long traffic captures which slows down the fingerprint extraction.

Period stability (17 features). Features in this category measure the stability of the inferred periods using and metrics, as discussed in Sect IV-A. The mean and standard deviation (SD) of and metrics are computed for each flow and period. Features 17-20, respectively 24-27, are calculated by binning the values of Mean, respectively Mean, into four ranges and counting the number of values in each bin. The bin ranges of mean and values were selected to distinguish noisy from pure single-period flows as well as different multi-periodic flows , . Features 21-23, respectively 28-30, are calculated by binning the values of SD, respectively SD, into three ranges and counting the number of values in each bin. These ranges were selected to distinguish very stable from stable and unstable periodic flows. Features 31-33 are computed by binning the values of the difference and into three ranges of values and counting the corresponding bin cardinalities. These ranges were selected to characterize the differences between stable and unstable periods of flows.

Category f Description Importance

1 # periodic flows 0.440
2 # periodic flows (protocol layer 4) 0.465
3 Mean periods per flow 0.068
periodic flows 4 SD periods per flow 0.037
5 # flows having only one period 0.429
6 # flows having multiple periods 0.176
7 # flows with static source port 0.533
8 Mean frequency source port change 0.310
9 SD frequency source port change 0.137

period accuracy
10 # periods inferred in all sub-captures 0.329
11 Mean period inference success 0.037
12 SD period inference success 0.022

13 # periods 0.409
period 14 # periods 0.408
duration 15 # periods 0.467
16 # periods 0.419

17 # Mean 0.386
18 # Mean 0.436
19 # Mean 0.239
20 # Mean 0.124

21 # SD 0.185
22 # SD 0.151
23 # SD 0.185

24 # Mean 0.288
period 25 # Mean 0.307
stability 26 # Mean 0.313
27 # Mean 0.246

28 # SD 0.217
29 # SD 0.217
30 # SD 0.220

31 # Mean Mean 0.408
32 # Mean Mean 0.248
33 # Mean Mean 0.482



TABLE I:

33 features (4 categories) used for device-type identification. # represents a count, SD is the standard deviation. Importance scores are computed using ReliefF feature selection algorithm 

[23]. High scores (green) corresponds to the most relevant features and low scores (red) to the least relevant features.

Iv-C Device-Type Fingerprint Classification

Our device-type identification technique is designed to be fully autonomous. It does not require human interaction nor labeled data to operate. When an IoT device is associated to an Security Gateway, the latter monitors its network traffic and extracts a fingerprint as described in Sect. IV-B. The fingerprint is sent to the IoT Security Service, which attempts to identify the type of the device having this fingerprint. If the fingerprint has a match, the type of the device is identified and the fingerprint is used to retrain and improve its identification model. If no match is found, the IoT Security Service uses the fingerprints to learn a model for this new device type.

As mentioned earlier, device types we use here are abstract. They do not refer to meaningful pre-learned labels such as ”D-LinkCam IP camera” but are reference identifiers specific to DÏoT, e.g., type#12. These identifiers match behavior models used for anomaly detection as later presented in Section V.

The system starts operating with no identification model. As IoT Security Service receives fingerprints from Security Gateway, it creates type identifiers (e.g., type#12) and learns an identification model for them. The longer the system runs and the more Security Gateways contribute to it, the more device types it is able to identify and the better the accuracy of identification.

We implement automated device-type identification using a supervised k-Nearest Neighbors (kNN) classifier 

[24]. kNN is chosen because of its ability to deal with a large number of classes and an imbalanced dataset. Each device type is represented by one class and the training data available for each class may be imbalanced (as IoT devices are differently deployed). kNN forms small clusters of at least neighbors to represent a class. In a supervised mode, several clusters can define a class, capturing its potential diversity. This allows fingerprints collected from a device from which we already know the type to form new clusters with the same type label. When fingerprints for device types unknown to the model are processed, they are detected as exceeding a threshold distance to the nearest cluster of the classification model. A new class can be added to the model to represent this yet unknown device type.

Our features are processed and should not require complex association to differentiate device types. Consequently, we use the Euclidian distance as distance measure in kNN. All 33 features of our fingerprints are scaled on the range to have an equal weight in the classification task. Fingerprints are extracted from network traffic captures of 30 minutes. We tested several capture durations: minutes. A duration lower than 30 minutes missed flows of long periodicity (10 minutes) and degraded the accuracy of identification. A duration longer than 30 minutes did not improve accuracy but increased the delay to identify a device. We set to meet a trade-off between representativeness of a learned class and need for training data. A class for a new device-type can be learned as soon as we get five fingerprints for it, i.e., after 2.5 hours of monitoring.

The design of our fingerprint classification approach does not require any labeled data to operate. It allows DÏoT to learn and label device types without human intervention by clustering fingerprints and generating labels for clusters. Four parameters need to be tuned and defined for device-type identification prior to deployment of DÏoT: the traffic capture duration, the sampling period of the flows, and the threshold distance for kNN. Optimal values for these parameters can be determined in a lab setup using a small set of IoT devices. After that, DÏoT can run in a fully autonomous manner, without human intervention. Our device-type identification approach allows DÏoT to manage a large number of device types since these are represented as clusters in a high dimensional space (33 dimensions). A multitude of non overlapping clusters can be created in this space. The addition of new device types to the system is an automatic process of creating new clusters in this space.

V Device-Type-Specific Anomaly Detection

Our anomaly detection approach is based on evaluating the communication patterns of a device to determine whether it is consistent with the learned benign communication patterns of that particular device type. The detection process is shown in Fig. 4. In Step 1 the communication between the Security Gateway and the IoT device is captured as a sequence of packets . Each packet is then in Step 2 mapped to a corresponding symbol characterizing the type of the packet using a mapping that is based on distinct characteristics derived from each packet’s header information as discussed in Sect. V-A. The mapped sequence of symbols is then in Step 3 input into a pre-trained model using Gated Recurrent Units (GRUs) [25, 26]

. The GRU model will calculate a probability estimate

for each symbol based on the sequence of preceding symbols

. GRU is a novel approach to recurrent neural networks (RNN) currently being a target of lively research. GRUs provide similar accuracy as other RNN approaches but are computationally less expensive 

[26, 27]. In Step 4 the sequence of occurrence probability estimates is evaluated to determine possible anomalies. If the occurrence probabilities of a sufficient number of packets in a window of consecutive packets fall below a detection threshold, as described in detail in Sect. V-B, the packet sequence is deemed anomalous and an alarm is raised.

Fig. 4: Overview of device-type-specific anomaly detection

V-a Modelling Packet Sequences

Data packets in the packet sequence emitted by an IoT device are mapped into packet symbols based on 7-tuples of discrete packet characteristics of packet . This mapping is defined by a device-type-specific mapping function s.t. where is the domain of raw network packets and is the domain of packet symbols for device-type . Mapping assigns each unique combination of packet characteristics a dedicated symbol representing the ’type’ of the particular packet. We use the following packet characteristics shown also in Tab. II:

  • [noitemsep]

  • direction: (incoming / outgoing) Normal TCP traffic is usually balanced two-way communication but abnormal is not as, e.g., a bot only sends packets to a victim without receiving replies when running DDoS attacks.

  • and local and remote port type: (system / user / dynamic) Each device-type uses specific ports designed by the manufacturers while malicious attack patterns usually use different ports.

  • packet length: (bin index of packet’s length where eight most frequently occurring packet lengths receive dedicated bins and one bin for other packet length values) Each device-type communicates using specific packet patterns with specific packet lengths that are mostly different in malicious attack patterns.

  • TCP flags: Normal communications contain packets with specific TCP flag sequences e.g., . However, many attacks do not follow standard protocols, e.g., SYN flood (DDoS attack) only sends messages.

  • encapsulated protocol types: Each device type usually uses a set of specific protocols, which is likely different from protocol types used in attacks.

  • IAT bin: (bin index of packet inter-arrival time (IAT) using three bins: ,   to   , and ) Many attacks (e.g., DDoS) usually generate traffic at a high packet rate, resulting in smaller IAT values in than normal communications.

ID Characteristic Value
direction incoming, outgoing
local port type bin index of port type
remote port type bin index of port type
packet length bin index of packet length
TCP flags TCP flag values
protocols encapsulated protocol types
IAT bin bin index of packet inter-arrival time
TABLE II: Packet characteristics used in symbol mapping

V-B Detection Process

Fig. (a)a shows an example of the occurrence frequencies of individual packet symbols for benign and attack traffic (as generated by the Mirai malware) for Edimax smart power plugs. It can be seen that using packet symbols alone to distinguish between benign and attack traffic is not sufficient, as both traffic types contain packet types that are mapped to the same symbols. Our detection approach is therefore based on estimating the likelihood of observing individual packet types given the sequence of preceding packets. The rationale behind this approach is the observation that IoT device communications usually follow particular characteristic patterns. Traffic generated by IoT malware, however, doesn’t follow these patterns and can therefore be detected.

(a) Symbol distribution
(b) Probability distribution
Fig. 7: Packet symbol occurrence frequencies and occurrence probability estimates for benign and attack traffic for Edimax smart power plugs

We will thus use the detection model to calculate an occurrence probability for each packet symbol given the sequence of preceding symbols , i.e.,

(3)

Parameter is a property of the used GRU network and denotes the length of the lookback history, i.e., the number of preceding symbols that the GRU takes into account when calculating the probability estimate. From Fig. (b)b we can see that these probability estimates are on average higher for packets belonging to benign traffic patterns, and lower for packets generated by malware on an infected device and can therefore be flagged as anomalous.

Definition 1 (Anomalous packets)

Packet mapped to packet symbol is anomalous, if its occurrence probability is below detection threshold , i.e., if

(4)

We performed an extensive empirical analysis of the probability estimates provided by device-specific detection models for both benign and attack traffic for the datasets described in Sect. IX and could determine that a value of provides a good separation between benign and attack traffic, as can be also seen in Fig. (b)b. An example of our approach is shown in Fig. 8. Malicious packets (represented by symbol ’#0’) get very low probability estimates (<), distinguishing them clearly from benign packets. However, their presence at indices also affects the estimate of the benign packet ’#41’ at index (<), since the sequence of packets preceding this packet is unknown to the detection model.

Fig. 8: Occurrence probabilities of 15 packets from Edimax Plug when Mirai was in standby stage. The red ’#0’ denotes the malicious packets.

Triggering an anomaly each time an anomalous packet is observed would lead to numerous false positive detections, as also benign traffic may contain noise that is not covered by the GRU model and will therefore receive low occurrence probability estimates. An anomaly is therefore triggered only in the case that a significant number of packets in a window of consecutive packets are anomalous.

Definition 2 (Anomaly triggering condition)

Given a window of consecutive packets represented by symbol sequence , we trigger an anomaly alarm, if the fraction of anomalous packets in is larger than an anomaly triggering threshold , i.e., if

(5)

Vi Federated Learning Approach

The GRU models are learned using traffic collected at several Security Gateways, each monitoring a client IoT network. Each Security Gateway observing a device of a particular type contributes to training its anomaly detection model. We take a federated learning approach to implement the distributed learning of models from several clients. Federated learning is a communication-efficient and privacy-preserving learning approach suited for distributed optimization of Deep Neural Networks (DNN)  [28, 29]. In federated learning, clients do not share their training data but rather train a local model and send model updates to a centralized entity which aggregates them. Federated learning is chosen because it is suitable [30] for scenarios where:

  • [noitemsep]

  • data are massively distributed, so that there is a large number of clients each having a small amount of data. IoT devices typically generate little traffic, which means only little data can be provided by each client alone.

  • contributions from clients are imbalanced. In our system, the training data available at each Security Gateway depends on the duration that an IoT device has been in the network and the amount of interaction it has had, which varies largely between clients.

Vi-a Learning Process

Fig. 9: Overview of federated learning process

The federated training process is illustrated in Fig. 9. Each Security Gateway having devices of a particular type in its network requests a detection profile for this type from IoT Security Service in Step 1 and gets an initial GRU model for this type in Step 2. At the start of DÏoT, this model is random, otherwise it is already trained through several rounds of the following process. In Step 3 the global model is re-trained locally by each Security Gateway with traces collected by monitoring communication of the devices. Then in Step 4 local updates made to the model by each Security Gateway are reported to IoT Security Service which in Step 5 aggregates them to improve the global model. Finally, the updated global model for devices is then pushed back to Security Gateway and used for anomaly detection (Step 6). The re-training of the model is performed on a regular basis to improve its accuracy.

To train our models we adopt an approach introduced by McMahan et al. [30]. Each client (Security Gateway

) trains its GRU model locally for several epochs before reporting updates to

IoT Security Service. This limits the communication overhead by reducing the number of updates to send to the IoT Security Service. To the best of our knowledge we are the first to employ a federated learning approach for anomaly detection-based intrusion detection.

Vi-B Federated Learning Setup

We implemented the federated learning algorithm utilizing the flask [31] and flask_socketio [32] libraries for the server-side application and the socketIO-client [33] library for the client-side application. The socketIO-client uses the gevent asynchronous framework [34] which provides a clean API for concurrency and network related tasks. We used the Keras [35] library with Tensorflow backend to implement the GRU network with the parameters selected in Sect. VII-B.

Vii Experimental Setup

To evaluate DÏoT, we apply it on the use case of detecting real-life IoT malware. We selected Mirai for this purpose, since its source code is publicly available and several other malware variants like Persirai [11] or Hajime [2] have been implemented using the same code base or closely follow a similar behavior. This makes Mirai a highly relevant baseline for IoT malware behavior.

Vii-a Datasets

We collected extensive datasets about the communication behavior of IoT devices in laboratory and real-world deployment settings. The monitored devices included 33 typical consumer IoT devices like IP cameras, smart power plugs and light bulbs, sensors, etc. The devices were mapped by our device-type-identification method to 23 unique device types. The detailed list of devices and assignment to device-types can be found in Table VII and IX in App.A and C . We collected datasets by setting up a laboratory network as shown in Fig. 10 using hostapd on a laptop running Kali Linux to create a Security Gateway (SGW) acting as an access point with WiFi and Ethernet interfaces to which IoT devices were connected. On the SGW we collected all network traffic packets originating from the monitored devices using tcpdump.

Fig. 10: Laboratory network setup

Vii-A1 Activity dataset

A key characteristic of IoT devices is that they expose only a few distinct actions accessible to users, e.g., ON, OFF, ADJUST, etc. To capture the communication patterns related to user interactions with IoT devices, we collected a dataset encompassing all such actions being invoked on the respective IoT devices. We repeatedly performed actions shown in Tab. III. Each of the actions was repeated 20 times (20-time repetition chosen as a rule of thumb). To capture also less intensive usage patterns, the dataset was augmented with longer measurements of two to three hours, during which actions were triggered only occasionally. This dataset contains data from 33 IoT devices out of which 27 have both action and standby data. Six devices (lighting and home automation hubs) have standby data only because they do not provide meaningful actions that users could invoke.

Background traffic. In order to identify the inherent communication patterns of IoT devices, we collected a dataset characterizing the background traffic IoT devices generate while no explicit actions are invoked. This dataset captures any communications resulting from actions devices execute in standby mode, like, e.g., heartbeat messages or regular status updates or notifications.

Category (count) Typical actions
IP cameras (6) START / STOP video, adjust settings, reboot
Smart plugs (9) ON, OFF, meter reading
Sensors (3) trigger sensing action
Smart lights (4) turn ON, turn OFF, adjust brightness
Actuators (1) turn ON, turn OFF
Appliances (2) turn ON, turn OFF, adjust settings
Routers (2) browse amazon.com
Hub devices (6) no actions
TABLE III: Actions for different IoT device categories

Vii-A2 Deployment dataset

To evaluate DÏoT in a realistic smart home deployment setting, in particular with regard to how many false alarms it will raise, we installed a number of () different smart home IoT devices111The number of devices was limited, as the driver of the used WiFi interface allowed at most 16 devices to reliably connect to it simultaneously. in several different domestic deployment scenarios. This deployment involved real users and collected communication traces of these devices under realistic usage conditions. We used the same set-up as in the laboratory network for the domestic deployment, albeit we excluded the attack server. Users used and interacted with the IoT devices as part of their everyday life. Packet traces were collected continuously during one week.

Vii-A3 Attack dataset

For evaluating the effectiveness of DÏoT at detecting attacks, we collected a dataset comprising malicious traffic of IoT devices infected with Mirai malware [1, 3] in all four different attack stages discussed below: pre-infection, infection, scanning and DoS attacks (as a monetization stage). Additionally, we collected traffic when Mirai was in a standby mode, i.e., not performing any attack but awaiting commands from its Command & Control server.

Among 33 experimental devices, we found 5 devices which are vulnerable to the Mirai malware. The Attack dataset was collected from those five devices: D-LinkCamDCS930L, D-LinkCamDCS932L, EdimaxPlug1101W, EdimaxPlug2101W and UbntAirRouter. This was done by installing the Command & Control, Loader and Listener server modules on the laboratory network for infecting target devices with Mirai and controlling them. Infection was achieved using security vulnerabilities like easy-to-guess default passwords to open a terminal session to the device and issuing appropriate commands to download the malware binary onto the device.

In the pre-infection stage, Loader sends a set of commands via telnet to the vulnerable IoT device to prepare its environment and identify an appropriate method for uploading the Mirai binary files. We repeated the pre-infection process 50 times for each device. During each run, around 900 pre-infection-related packets were generated.

After pre-infection the infection stage commences, during which Loader uploads Mirai binary files to the IoT device. It supports three upload methods: wget, tftp and echo (in this priority order). To infect the two D-Link cameras and the Ubnt router Loader uses wget, on the Edimax plugs it will resort to using tftp as these are installed on the devices by default. We repeated the infection process 50 times for each device, each run generating approximately 700 data packets.

In the scanning stage we collected packets while the infected devices were actively performing a network scan in order to locate other vulnerable devices. Data collection was performed for five minutes per device, resulting in a dataset of more than 446,000 scanning data packets.

We extensively tested the DoS attack stage

, utilizing all ten different DoS attack vectors (for details, see App. 

D) available in the Mirai source code [36]. We ran all attacks separately on all five compromised devices for five minutes each, generating more than 20 million packets of attack traffic in total.

Tab. IV summarizes the sizes and numbers of distinct packets and packet flows in the different datasets. While packet flows can’t be directly mapped to distinct device actions, they do provide a rough estimate of the overall level of activity of the targeted devices in the dataset.

Dataset
(Number of devices)
Time
(hours)
Size
(MiB)
Flows Packets
Activity (33) 165 465 115,951 2,087,280
Deployment (14) 98 578 95,518 2,286,697
Attack (5) 84 7,734 8,464,434 21,919,273
TABLE IV: Characteristics of used datasets

Vii-B Parameter Selection

Based on initial experiments with our datasets (Tab. IV) we inferred that a lookback history of

symbols is sufficient to capture most communication interactions with sufficient accuracy. We used a GRU network with three hidden layers of size 128 neurons each. The size of the input and output layers is device-type-specific and equal to the number of mapping symbols of the function

, which is equal to (cf. Sect. V-A). We learned 23 anomaly detection models, each corresponding to a device type identified using the method described in Sect. IV. Each anomaly detection model was trained with, and respectively tested on, communication from all devices matching the considered type.

Vii-C Evaluation Metrics

We use false positive and true positive rate (FPR and TPR) as measures of fitness. FPR measures the rate at which benign communication is incorrectly classified as anomalous by our method causing a false alarm to be raised. TPR is the rate at which attacks are correctly reported as anomalous. We seek to minimize FPR, since otherwise the system easily becomes unusable, as the user would be overwhelmed with false alarms. At the same time we want to maximize TPR so that as many attacks as possible will be detected by our approach.

Testing for false positives was performed by four-fold cross-validation for device types in the Activity and Deployment datasets. The data were divided equally into four folds using three folds for training and one for testing. To determine the FPR, we divided the testing dataset according to Def. 2 into windows of packets. Since the testing data contained only benign communications, any triggered anomaly alarm for packets of the window indicated it as a false positive, whereas windows without alarms were considered a true negative.

Testing for true positives was done by using the Activity and Deployment datasets as training data and the Attack dataset for testing with the same settings as for false positive testing. Moreover, as we know that the Attack dataset also contains benign traffic corresponding to normal operations of the IoT devices, we were interested in the average duration until detection. Therefore, in each window of packets we calculated the number of packets required until an anomaly alarm was triggered in order to estimate the average detection time. In terms of TPR, such windows were considered true positives, whereas windows without triggered alarms were considered false negatives.

Viii Device-Type Identification Evaluation

Viii-a Accuracy

To evaluate the accuracy of our device-type identification technique, we computed fingerprints (cf. Sect. IV-B) from the Activity (included background traffic) dataset. We obtained 6,224 fingerprints representing 33 IoT devices.

To assess the relevance of our automatically defined device types, we trained a kNN model from the fingerprints, following the method presented in Sect. IV-C. It defined 23 classes (device types). 16 devices were each assigned its own separate device type. The remaining 17 were aggregated into 7 device types. The assignment of devices to automatically-defined device types is summarized in App. C. Different devices allotted to a given device type are always from the same manufacturer and have the same or similar purpose (smart plugs / IP cameras / smart switches / sensors). For example, type#06 contains two instances of the same IP camera. It is worth noting that several devices connected to Security Gateway through an intermediary gateway would be considered as a single device and would be allotted a single device type. Intermediary gateways are usually proprietary and connect devices from a same manufacturer that have also the same or similar purpose (e.g., light bulbs). We conclude that our grouping is relevant for our anomaly detection system since similar/same devices from same manufacturers are likely to have similar behavior that can be represented by a single anomaly detection model.

Fig. 11: Precision, recall and f1-score for identification of 23 device types (e.g., type#01).

We demonstrate the accuracy of device-type identification using a 4-fold stratified cross-validation. We randomly split our 6,224 fingerprints into four equal subsets while respecting class (device type) distribution. We use three subsets for training our kNN identification model and test it on the remaining subset. This process is repeated four times to test each of the four subsets. We ran the cross-validation 10 times with random seeds. Figure 11

presents the precision, recall and f1-score for identifying each device type. All metrics reach over 0.95 for most devices. The overall accuracy of identification across all types is 0.982, showing its effectiveness. A confusion matrix presents detailed results for this experiment in App. 

B.

Viii-B Speed

We computed the time required for identifying the type of a device. This process is divided into three stages. The first stage consists of capturing the traffic generated by the device, which lasts for a fixed duration of 30 minutes. The second stage consists of pre-processing and extracting the fingerprint from the traffic capture (steps 1+2 in Fig. 2), which lasts for 52.6 on average. The third stage is the classification of the fingerprint using kNN, which takes 0.1 on average.

The duration of device identification is largely dominated by the time required for traffic capturing (30 minutes = 1,800,000 ) that is 5 orders of magnitude longer than any of the other stages. The duration of traffic capture is static regardless of the number of devices to identify by Security Gateway or the number of device types (classes) in the kNN model. Fingerprint extraction must be run for each device connected to an Security Gateway. Let us assume that the Security Gateway needs to be capable of identifying a few tens of IoT devices; running this process in parallel would take less than 1 second. The time for fingerprint classification using kNN increases linearly with the number of training samples in the kNN model. Assuming that the same number of instances is kept for every class in the model, the time for fingerprint classification would increase linearly with the number of classes (device types) in the kNN model. Our model containing 23 device types takes 0.1 to classify a fingerprint. Thus managing thousands of device types would take less than 1 second and device identification would still be largely dominated by the traffic capture which takes 30 minutes.

Viii-C Feature importance

We computed scores for feature importance to evaluate the impact of our 33 features on device-type identification. Since kNN does not provide information about features most useful in classification, we used the ReliefF feature selection algorithm [23] to compute these scores. ReliefF is conceptually close to kNN since its feature scoring is based on the differences in feature values between nearest neighbor instance pairs.

Table I presents the importance score for each feature. All four period duration features have high scores, which shows that IoT devices of different types have periodic flows with very different durations. The counts of periodic flows (f1-f2) are also highly relevant features meaning that IoT devices of different types have different numbers of periodic flows. The most relevant feature is the count of flows with a static source port (f7). This means that IoT devices are heterogeneous in the way they manage their periodic communications: some keep an open connection over time while others periodically re-initiate a new connection for the same flow. While some features have a low importance (e.g., f3-f4-f11-f12

), they slightly improve the accuracy of device-type identification and we decided to keep them in our set of features. A large set of features (and some feature redundancy) also increases the resilience of machine learning based systems to adversarial machine learning attacks such as data poisoning 

[37].

Viii-D Learning time

To show that our identification model can be quickly learned, we evaluate its accuracy with a varying amount of training data. As presented in Sect. IV-C, we selected as minimum number of components for a class in kNN. Figure 12 depicts the increase in precision, recall and f1-score as we vary the size of the training set from 5 fingerprints per device (2.5 hours monitoring) to 40 fingerprints per device (20 hours monitoring). We see that the accuracy in all metrics increases quickly from 0.87 to 0.95 but then stabilizes with a small gradient. It shows that after a few () hours of monitoring, more training data does not significantly increase accuracy. This time is likely even shorter considering that several Security Gateways contribute training data (fingerprints) for each device type in parallel. This shows that learning an effective device identification model requires only a few hours of traffic monitoring globally.

Fig. 12: Precision, recall and f1-score increase with respect to training set size.

To summarize, we showed that our method for automatically learning device type is relevant on a large set of 33 IoT devices. We demonstrate that the identification technique is effective and accurate (98.2%) across all tested devices, even when using little training data, which makes it fast at identifying newly released IoT devices.

Ix Intrusion Detection Evaluation

Ix-a Accuracy

Fig. 13: ROC curve of TPR and FPR in dependence of detection threshold and anomaly triggering threshold .

To determine appropriate values for the detection threshold and anomaly triggering threshold , we evaluated FPR using the Activity (33 devices) dataset and TPR using the Attack (5 devices) dataset for a fixed window size of . Fig. 13 shows the receiver operating characteristic (ROC) curve of FPR and TPR in dependence of these parameters. We can see that all curves quickly reach over 0.9 TPR while keeping a very low FPR (<0.01), which is one of the main objectives for our approach. We therefore select and at , which achieves TPR at <0.01 FPR.

Using these selected parameters in the Deployment (14 devices) dataset and Attack (5 devices) dataset, we achieved an attack detection rate of TPR and no false positives, i.e., FPR during one week of evaluation. These results show that DÏoT can successfully address challenge C3, reporting no false alarms in a real-world deployment setting. Tab. V shows the detailed performance of our system for different attack scenarios (cf. Sect. VII). The time to detect attacks varies according to the traffic intensity of the attacks. The average detection delay over all tested attacks is  . DÏoT can detect an attack in the pre-infection stage after 223 packets while Mirai generates more than 900 packets during pre-infection. It means DÏoT is able to detect the attack even before the attack proceeds to the infection stage.

The detection rate for DoS attacks is lower than for other attack stages. However, all DoS attacks are eventually detected because DoS attacks have a high throughput ( packets/s.) and we analyze five windows of 250 packets per second at this rate. Considering the TPR we achieve on DoS attacks, four windows out of five are detected as anomalous and trigger an alarm. It is also worth noting that infected devices in standby mode get detected in 33.33% of cases, while this activity is very stealthy ( packets/s).

Attack packets/s. det. time (.) TPR
standby 0.05 4,051,889 33.33%
Pre-Infection 426.66 524 100.00%
Infection 721.18 272 93.45%
Scanning 752.60 166 100.00%
DoS 1,412.94 92 88.96%
Average 866.88 257194 95.60%
TABLE V: Average detection times of analyzed Mirai attacks

Ix-B Efficiency of Federated Learning

Fig. 14: Evolution of TPR and FPR as we increase the number of clients in federated learning. TPR decreases slightly (-3%) while FPR reaches 0 (-21%) when using 15 clients.

We conducted a set of experiments to evaluate federated learning performance with different numbers of clients (ranging from 2 to 15) contributing to the training of the models. We selected the number of epochs that each client trains its local model to be and specified the number of communication rounds between clients and server to be . Therefore, the local models were trained a total of 51 epochs. This was deemed sufficient since in our initial experiments utilizing a centralized learning setting the models converged after approximately 50 epochs. Each client was allocated a randomized subset of training data from the Deployment dataset (ranging from 0.1% to 10% of the total training dataset size) and we evaluated the system’s performance for different numbers of clients involved in building the federated model. We repeated our experiment three times for each device type, with random re-sampling of the training datasets. As expected, Fig. 14 shows that the federated models with more participating clients achieve better FPR, while TPR deteriorates only slightly.

Federated learning provides better privacy for clients contributing to training as they do not need to share their training data. However, this may result in a loss of accuracy of the obtained model in comparison to training the model in a centralized manner. To evaluate this possible loss in accuracy, we trained three federated models using the entire training dataset by dividing it among 5, 9 or 15 clients and comparing these to a model trained in a centralized manner. Tab. VI shows a small decrease in TPR as we increase the number of clients while FPR remains constant at . This small drop in TPR is not a concern since a large number of packet windows would still trigger an alarm for any attack stage.

Type
Centralized
learning
Federated learning
5 clients 9 clients 15 clients
FPR 0.00% 0.00% 0.00% 0.00%
TPR 95.60% 95.43% 95.01% 94.07%
TABLE VI: Effect of using federated learning comparing to centralizing approach

Ix-C Data Needed for Training

Fig. 15: Effect of training data size in time to FPR

Fig. 15 shows an example of detection model performance for two Edimax smart plug devices (models 1101W & 2101W) in dependency of the amount of data used for training the model. We divided the 7-day Deployment dataset into one-hour data chunks and randomly sampled different amounts of data chunks for training the model, gradually increasing the training dataset size. The figure shows that the FPR decreases noticeably when the training dataset grows. More importantly, the model needs less than 25 hours of data to achieve FPR = 0. It shows that our detection model needs little data for training and it means DÏoT can address challenge C4. Moreover, with the help of our federated learning approach leveraging several clients contributing to training the model, each client needs only a small amount of data i.e, 2.5 hours if there are ten clients involved. It justifies our assumption A1 as mentioned in Sect.  III-A.

Ix-D Efficiency of Device-Type-Specific Models and Scalability

Traditional anomaly detection approaches utilizing a single model for modeling benign behavior easily suffer from increasing false positive rates or decreasing sensitivity when the number of different types of behaviors (i.e., device types) captured by the model grows. This makes them unsuitable for real-world deployments with hundreds or thousands of different device types. Our solution, however, does not have this drawback, as it uses a dedicated detection model for each device type (details in Sect. V). Each of these models focuses solely on the characteristic behavior of one single device type, resulting in more specific and accurate behavioral models, independent of the number of different device types handled by the system. To evaluate the benefit of using device-type-specific anomaly detection models compared to using a single model for all devices, we evaluated a single model on the whole Deployment dataset using 4-fold-cross validation and evaluated detection accuracy on the Attack dataset. The result is as expected: FPR increases from 0% to 0.67% while TPR increases from 95.6% to 97.21%. However, as mentioned in Sect. III, a high false alarm rate would make the anomaly detection system impractical. If the system had FPR of 0.67% in our deployment setup, it would trigger around eight alarms per day. It means a smarthome with dozens of devices could have hundreds of false alarms per day.

Ix-E Performance

We evaluated the processing performance of GRU without specific performance optimizations on a laptop and a desktop computer. The laptop ran Ubuntu Linux 16.04 with an Intel©Core™i7-4600 CPU with 8GB of memory, whereas the desktop ran Ubuntu Linux 18.04 with an Intel©Core™i7-7700 CPU with 8GB of memory and a Radeon RX 460 core graphic card. We evaluated the processing performance of GRU without specific optimizations on a laptop and a desktop computer. The laptop ran Ubuntu Linux 16.04 with an Intel i7-4600 CPU with 8GB of memory, whereas the desktop ran Ubuntu Linux 18.04 with an Intel i7-7700 CPU with 8GB of memory and a Radeon RX 460 core graphics card with GPU. Average processing time per symbol (packet) for prediction was   for the desktop utilizing its GPU and   when executed on the laptop with CPU. On average, training a GRU model for one device type took 26 minutes on the desktop and 71 minutes on the laptop hardware when considering a week’s worth of data in the Deployment dataset. We conclude from this that model training will be feasible to realize in real deployment scenarios, as training will in any case be done gradually as data are collected from the network over longer periods of time.

X Effectiveness

X-a Generalizability of device fingerprinting

Features that compose our device fingerprint have been defined to model periodic flows and to differentiate IoT devices having different periodic flows. This feature definition and the use of a specific classifier, kNN, was motivated in Sect. IV. As in any machine learning application, the efficacy of a feature set and a classifier can only be demonstrated for a specific task and a specific dataset (no free lunch theorem [38]).

To ensure generalizability, we defined fingerprint features and selected a kNN classifier without prior knowledge about communications of specific IoT devices. Consequently our features are independent from any dataset and more specifically from the data we later processed in experiments. Data-independent features and the machine learning method choice ensure generalizability of the fingerprinting technique [39]. Having assessed our technique on a large set of 33 IoT devices (IP cameras, sensors, coffee machine, etc.) representative of typical smart home IoT devices, we expect that the high efficacy (98.2% accuracy) seen during our evaluation (cf. Sect. VIII) is likely to be generalizable to other IoT devices.

Some IoT devices, especially those that operate on battery power, may be kept turned off by default and activated only on explicit user triggers. Such devices naturally will not have periodic communications; consequently techniques like DÏoT are not effective in identifying such devices. However, these devices are also not critical from a security perspective: for example, such devices are unlikely to be detected by IoT malware scanning the local network.

X-B Generalizability of Anomaly Detection

Although we focused our evaluation on the most well-known IoT malware so far: Mirai [1] for the use case, DÏoT is likely effective also in detecting other botnet malware like Persirai [11], Hajime [2], etc. DÏoT’s anomaly detection leverages deviations in the behavior of infected IoT devices caused by the malware. Such deviations will be observable for any malware.

X-C Evolution of IoT Device Behavior

The behavior of an IoT device type can evolve due to, e.g., firmware updates that bring new functionality. This modifies its behavior and may trigger false alarms for legitimate communication. We prevent these false alarms by correlating anomaly reports from all Security Gateways at the IoT Security Service. Assuming firmware updates would be propagated to many client networks in a short time, if alarms are reported from a large number of security gateways for the same device type in a short time, we can cancel the alarm and trigger re-learning of the corresponding device identification and anomaly detection models to adapt to this new behavior. To ensure that sudden widespread outbreaks of an IoT malware infection campaign are not erroneously interpreted as firmware updates, the canceling of an alarm can be confirmed by a human expert at the IoT Security Service. This should represent a small burden, as the roll-out of firmware updates is a relatively seldom event.

X-D Spoofing device fingerprint

A compromised device can attempt to modify its background traffic such that its fingerprint changes and it gets identified as another device type. This is unlikely to happen since fingerprinting is a one time operation performed when a new IoT device is detected in the network. According to assumption A2 (Sect. III-B), devices are not yet compromised when installed in a network.

Spoofing of a targeted device fingerprint requires the attacker to generate new periodic communication and to disable existing periodic communication. The latter impacts the functionality of a device, which may be detected as compromised by its user (e.g., missing periodic report) or its cloud service provider (e.g., missing reception of periodic heartbeat signal). In addition, spoofing a device type fingerprint has the only effect for the device to get assigned a different anomaly detection model than the one it was supposed to. First, this means that the device will anyway have a restricted communication behavior as defined by the anomaly detection model of the spoofed type (this model may of course be less restrictive). Second, communication related to legitimate functionality of the device will trigger anomalies, as it is likely not included in the anomaly detection profile of the spoofed type.

X-E Spoofing MAC address

DÏoT anomaly detection is based on monitoring layer-2 traffic involving a particular device, identified by its MAC address. An adversary who has compromised a device can attempt to evade identification by spoofing its MAC address in the packets it sends out. MAC address spoofing can be mitigated using additional techniques for fingerprinting hardware interfaces on wireless [14, 17] and on wired connections [40]. These build a unique signature for the packets sent by a device related to hardware characteristics. Such fingerprints are difficult to spoof [41]. Alternatively, secured association protocols like WiFi Protected Setup provided by WPA2 [42] can be used to associate IoT devices to Security Gateway. Such association protocols require user involvement (e.g., physically pushing a button on the gateway) to associate a new device to the access point. The association results in a device-specific shared key that can be subsequently used by the gateway to authenticate the device. This prevents rogue devices from connecting to the network by spoofing the MAC address of a device already associated with Security Gateway.

X-F Mimicking Legitimate Communication

An adversary that has compromised an IoT device can attempt to mimic the device’s legitimate communication patterns to try to remain undetected. However, as the device-type-specific detection model is restricted to the (relatively limited) functionality of the IoT device, it is in practice very difficult for the adversary to mimic legitimate communication and at the same time achieve a malicious purpose, e.g., scanning, flooding, etc., especially when considering that any change in packet flow semantics is also likely to change the characteristics (protocol, packet size, port, etc.) of packets and their ordering, which are both used for detecting anomalies in the packet sequence. Moreover, adversaries would need to know the device-type-specific communication patterns in order to mimic them. This makes it significantly harder for adversaries to develop large scale IoT malware that affects a wide range of different IoT device types in the way that, e.g., Mirai does.

X-G Adversarial Machine Learning

Adversarial examples. If an adversary manages to compromise an IoT device while remaining undetected, it can attempt to ’poison’ the training process of the system by forging packets as adversarial examples that are specifically meant to influence the learning of the model in such a way that malicious activities are not detected by it. There exist techniques to forge adversarial examples to neural networks [43]. However, these apply to images [44, 45] and audio inputs [46, 47], where objective functions for the underlying optimization problem are easy to define.

Forging adversarial examples consists of finding minimal modifications for an input of class such that is classified as . For example, in our case this would mean that a malicious packet is incorrectly classified as a benign one. In contrast to image or audio processing, however, our features (symbols) are not raw but processed from packet properties. First, it means that modifications are computed for our symbolic representation of packet sequences which are difficult to realize in a way that would preserve their utility for the adversary, i.e., realize ’useful’ adversarial functionality required for malicious activities like scanning or DoS. Second, it is difficult to define the objective distance to minimize in order to achieve “small modifications” since modifying the value of one single packet characteristic (protocol, port, etc.) can change the semantics of a packet entirely.

Poisoning federated learning. For initial model training, we can assume the training data contain only legitimate network traffic, as devices are assumed initially to be benign (assumption A1 (Sect. III-A)). However, the federated setting can be subject to poisoning attacks during re-training, where the adversary uses adversarial examples as described above to corrupt the anomaly detection model so that it eventually will accept malicious traffic as benign [48]

(or vice versa). Techniques have been developed for preventing poisoning attacks by using local outlier detection-based filtering of adversarial examples to pre-empt model poisoning 

[49].

In the scope of this paper we assume that the Security Gateway is not compromised by the adversary (assumption A2 (Sect. III-A)). However, since a malicious user can have physical access to his Security Gateway, it is thinkable that he could compromise it in order to stage a poisoning attack against the system using adversarial examples. In this case, local filtering of adversarial examples is not possible, as it can not be enforced by the compromised Security Gateway. We are therefore currently focusing our ongoing research efforts on applying poisoning mitigation approaches applied at the IoT Security Service. These include using more robust learning approaches less resilient to adversarial examples that will ’average out’ the effects of adversarial examples, as well as approaches similar to, e.g., Shen et al. [50], where malicious model updates are detected before they are incorporated in global detection models.

Xi Related Work

Xi-a Anomaly Detection in IoT Network

Several solutions have been proposed for the detection and prevention of intrusions in IoT networks [51, 52, 53], sensor networks [9] and industrial control systems [54, 55]. SVELTE [53] is an intrusion detection system for protecting IoT networks from already known attacks. It adapts existing detection techniques to IoT-specific protocols, e.g., 6LoWPAN. In contrast, DÏoT performs dynamic detection of unknown attacks and only models legitimate network traffic. Jia et al. [52] proposed a context-based system to automatically detect sensitive actions in IoT platforms. This system is designed for patching vulnerabilities in appified IoT platforms such as Samsung SmartThings. It does not adapt to multi-vendor IoT systems while DÏoT is platform independent.

Detecting anomalies in network traffic has a long history [56, 57, 7, 58, 8, 59, 60]. Existing approaches rely on analysing single network packets [57, 7] or clustering large numbers of packets [8, 9] to detected intrusions or compromised services. Some works have proposed, as we do, to model communication as a language [55, 59]. For instance, authors of [59] derive finite state automatons from layer 3-4 communication protocol specifications. Monitored packets are processed by the automaton to detect deviations from protocol specification or abnormally high usage of specific transitions. Automatons can only model short sequences of packets while we use GRU to model longer sequences, which enables the detection of stealthy attacks. Also, modeling protocol specification is coarse and leaves room for circumventing detection. In contrast, we use finer grained features for modeling packets. These are difficult to forge while preserving the adversarial utility of malicious packets. Finally, previous work did not tackle the problem of gathering data for training anomaly detection models. This is a tedious and long task considering the large number of IoT devices. DÏoT integrates a crowdsourced federated learning solution to address the training of anomaly detection models.

Lately, recurrent neural networks (RNN) have been used for several anomaly-detection purposes. Most applications leverage long short-term memory (LSTM) networks for detecting anomalies in time series 

[61], aircraft data [62] or system logs [63]

. One close application is the use of deep belief networks for mining DNS log data and detect infections in enterprise networks 

[64]. In contrast to these works, DÏoT uses a different flavor of RNN, namely GRU, for anomaly detection. Also previous security applications [63, 64] were targeted at offline analysis of log data, while DÏoT operates in real-time, detecting anomalies in live network traffic.

Xi-B Device-Type Identification

Early work in wireless communication fingerprinting targeted the identification of hardware- and driver-specific characteristics [15, 16, 65]. IoT-oriented device identification techniques leverage sensor-specific features [66, 67, 68, 69] to uniquely identify a device. Our identification technique is positioned between the former and latter approaches, providing the right granularity to passively identify device types.

Some solutions address device-type identification with the same granularity as we do [70, 71, 21, 72, 73], while considering different definitions of “type”. GTID [21] identifies the make and model of a device by analyzing the inter-arrival time of packets sent for a targeted type of traffic (e.g. Skype, ICMP, etc.). GTID requires a lot of traffic over several hours to identify a device’s type. Aksu et al. [72] also model the inter-arrival time of Bluetooth packets to identify different model of wearable devices from a smartphone. Maiti et al. [20]

introduced a device-type identification technique relying on analysis of encrypted WiFi traffic. A Random Forest classifier is trained with features extracted from a long sequence of WiFi frames. The technique was evaluated on 10 IoT devices and required at least 30,000 frames to be effective. In standby mode an IoT device can take days to generate such volumes of traffic. IoT Sentinel 

[13] leverages the burst of network traffic typical for the setup phase of an IoT device to identify its type. While accurate and requiring only two minutes of monitoring, IoT Sentinel only operates when a device is first installed to a network. Meidan et al. [71] analyze TCP sessions to identify generic types of IoT devices, i.e., smoke sensor, baby monitor, etc. The observation of at least 20 TCP sessions was required to reach acceptable accuracy for 17 devices. The authors reported that 1/3 of their IoT devices did not produce any TCP sessions without user interactions (i.e., in standby mode), and for the remaining 2/3 the mean inter-arrival time of TCP sessions was up to 5 minutes, requiring over one hour and a half to be identified. Guo and Heidemann [73] use our same intuition to identify IoT devices, namely that IoT devices periodically connect to specific services on the Internet. They identify the server names and IP addresses that a known IoT device connects to on the Internet. This information is later used to identify unknown devices if they connect to the same IP addresses. A limitation of this approach is that different IoT devices from a same manufacturer often connect to the same servers which produces collisions between device types from a same manufacturer. Also many IoT device manufacturers leverage cloud services such as Amazon for hosting their services [70], which can also produce collisions.

State-of-the-Art methods for device-type identification are supervised and require labeled data to be trained. DÏoT is not restricted to a finite set of pre-learned device types. It creates abstract device types, learns their fingerprints and adapts autonomously when new types are discovered. DÏoT is also not restricted to a specific type of dense network traffic. It is the first technique to identify IoT device types based on their background periodic communication. Consequently and in contrast to previous work, it identifies the type of an IoT device under any state of operation.

Some security solutions for the IoT with a distributed design close to DÏoT have been proposed in commercial solutions, e.g., IoT guardian from Zingbox [74]. While relying on an unsupervised device identification technique, IoT guardian does not propose any concrete implementation for it. Moreover, IoT guardian relies on partial deep packet inspection, which prevents it from being used on encrypted communications. DÏoT does not have such limitations.

Xii Conclusions

In this paper we introduced DÏoT: a self-learning system for detecting compromised devices in IoT networks. Our solution relies on novel automated techniques for device-type identification and device-type-specific anomaly detection. DÏoT does not require any human intervention or labeled data to operate. It learns device-type identification models and anomaly detection models autonomously, using unlabeled crowdsourced data captured in client IoT networks. We evaluated the accuracy of DÏoT’s device-type identification method on a large dataset comprising 33 real-world IoT devices showing that it quickly learns (in a few hours) accurate (98%) identification models. We demonstrated the efficacy of anomaly detection in detecting a large set of malicious behavior from devices infected by the Mirai malware. DÏoT detected of attacks in on average and without raising any false alarm when evaluated in a real-world deployment.

Acknowledgment

This work was supported in part by the Intel Collaborative Institute for Collaborative Autonomous and Resilient Systems (ICRI-CARS) and by the SELIoT project and the Academy of Finland under the WiFiUS program (grant 309994). We would like to thank Cisco Systems, Inc. for their support of this work.

References

  • [1] M. Antonakakis, T. April, M. Bailey, M. Bernhard, E. Bursztein, J. Cochran, Z. Durumeric, J. A. Halderman, L. Invernizzi, M. Kallitsis, D. Kumar, C. Lever, Z. Ma, J. Mason, D. Menscher, C. Seaman, N. Sullivan, K. Thomas, and Y. Zhou, “Understanding the mirai botnet,” in 26th USENIX Security Symposium (USENIX Security 17).   Vancouver, BC: USENIX Association, 2017, pp. 1093–1110.
  • [2] S. Edwards and I. Profetis, “Hajime: Analysis of a decentralized internet worm for IoT devices,” Rapidity Networks, Tech. Rep., 2016.
  • [3] C. Kolias, G. Kambourakis, A. Stavrou, and J. Voas, “DDoS in the IoT: Mirai and other botnets,” Computer, vol. 50, no. 7, pp. 80–84, 2017.
  • [4] Radware, “BrickerBot results in PDoS attack,” https://security.radware.com/ddos-threats-attacks/brickerbot-pdos-permanent-denial-of-service/.
  • [5] N. Hadar, S. Siboni, and Y. Elovici, “A lightweight vulnerability mitigation framework for iot devices,” in Proceedings of the 2017 Workshop on Internet of Things Security and Privacy, ser. IoTS&#38;P ’17.   New York, NY, USA: ACM, 2017, pp. 71–75. [Online]. Available: http://doi.acm.org/10.1145/3139937.3139944
  • [6] R. Doshi, N. Apthorpe, and N. Feamster, “Machine learning ddos detection for consumer internet of things devices,” CoRR, vol. abs/1804.04159, 2018. [Online]. Available: http://arxiv.org/abs/1804.04159
  • [7] C. Krügel, T. Toth, and E. Kirda, “Service specific anomaly detection for network intrusion detection,” in Proceedings of the 2002 ACM symposium on Applied computing.   ACM, 2002, pp. 201–208.
  • [8] L. Portnoy, E. Eskin, and S. Stolfo, “Intrusion detection with unlabeled data using clustering,” in In Proceedings of ACM CSS Workshop on Data Mining Applied to Security, 2001.
  • [9] S. Rajasegarar, C. Leckie, and M. Palaniswami, “Hyperspherical cluster based distributed anomaly detection in wireless sensor networks,” Journal of Parallel and Distributed Computing, vol. 74, no. 1, pp. 1833–1847, 2014.
  • [10] B. Krebs, “KrebsOnSecurity hit with record DDoS,” https://krebsonsecurity.com/2016/09/krebsonsecurity-hit-with-record-ddos/.
  • [11] T. Yeh, D. Chiu, and K. Lu, “Persirai: New internet of things (IoT) botnet targets IP cameras,” TrendMicro, https://blog.trendmicro.com/trendlabs-security-intelligence/persirai-new-internet-things-iot-botnet-targets-ip-cameras/.
  • [12] Y. M. P. Pa, S. Suzuki, K. Yoshioka, T. Matsumoto, T. Kasama, and C. Rossow, “IoTPOT: A novel honeypot for revealing current IoT threats,” Journal of Information Processing, vol. 24, no. 3, pp. 522–533, 2016.
  • [13] M. Miettinen, S. Marchal, I. Hafeez, N. Asokan, A. Sadeghi, and S. Tarkoma, “IoT Sentinel: Automated Device-Type Identification for Security Enforcement in IoT,” in Proc. 37th IEEE International Conference on Distributed Computing Systems (ICDCS 2017), Jun. 2017.
  • [14] T. Kohno, A. Broido, and K. C. Claffy, “Remote physical device fingerprinting,” IEEE Trans. Dependable Secure Comput., vol. 2, no. 2, pp. 93–108, April 2005.
  • [15] J. Cache, “Fingerprinting 802.11 implementations via statistical analysis of the duration field,” Uninformed, vol. 5, 2006.
  • [16] J. Franklin, D. McCoy, P. Tabriz, V. Neagoe, J. Van Randwyk, and D. Sicker, “Passive data link layer 802.11 wireless device driver fingerprinting,” in USENIX Security Symposium.   USENIX, 2006.
  • [17] V. Brik, S. Banerjee, M. Gruteser, and S. Oh, “Wireless device identification with radiometric signatures,” in International Conference on Mobile Computing and Networking.   ACM, 2008, pp. 116–127.
  • [18] V. Costan and S. Devadas, “Intel SGX explained.” IACR Cryptology ePrint Archive, vol. 2016, p. 86, 2016.
  • [19] G. Dessouky, S. Zeitouni, T. Nyman, A. Paverd, L. Davi, P. Koeberl, N. Asokan, and A.-R. Sadeghi, “LO-FAT: Low-overhead control flow attestation in hardware,” in Design Automation Conference (DAC), 2017 54th ACM/EDAC/IEEE.   IEEE, 2017, pp. 1–6.
  • [20] R. R. Maiti, S. Siby, R. Sridharan, and N. O. Tippenhauer, “Link-layer device type classification on encrypted wireless traffic with cots radios,” in European Symposium on Research in Computer Security.   Springer, 2017, pp. 247–264.
  • [21] S. V. Radhakrishnan, A. S. Uluagac, and R. Beyah, “GTID: A technique for physical device and device type fingerprinting,” IEEE Transactions on Dependable and Secure Computing, vol. 12, no. 5, pp. 519–532, 2015.
  • [22] S. Winograd, “On computing the discrete fourier transform,” Mathematics of computation, vol. 32, no. 141, pp. 175–199, 1978.
  • [23] I. Kononenko, E. Šimec, and M. Robnik-Šikonja, “Overcoming the myopia of inductive learning algorithms with ReliefF,” Applied Intelligence, vol. 7, no. 1, pp. 39–55, 1997.
  • [24] R. J. Samworth et al., “Optimal weighted nearest neighbour classifiers,” The Annals of Statistics, vol. 40, no. 5, pp. 2733–2763, 2012.
  • [25] keras.io, “Gated recurrent unit,” 2018, https://keras.io/layers/recurrent/.
  • [26] J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” CoRR, vol. abs/1412.3555, 2014, http://arxiv.org/abs/1412.3555.
  • [27] C.-Y. Wu, A. Ahmed, A. Beutel, A. J. Smola, and H. Jing, “Recurrent recommender networks,” in Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, ser. WSDM ’17.   New York, NY, USA: ACM, 2017, pp. 495–503. [Online]. Available: http://doi.acm.org/10.1145/3018661.3018689
  • [28] J. Konecný, B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” CoRR, vol. abs/1610.05492, 2016.
  • [29] V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated multi-task learning,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., 2017, pp. 4427–4437.
  • [30] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in

    Proceedings of the 20 th International Conference on Artificial Intelligence and Statistics

    , 2017, pp. 1273–1282.
  • [31] “A micro web framework written in python,” http://flask.pocoo.org.
  • [32] “Flask socketio,” https://flask-socketio.readthedocs.io/en/latest/.
  • [33] “Flask socketio client,” https://github.com/socketio/socket.io-client.
  • [34] “gevent asynchronous framework,” https://github.com/gevent/gevent.
  • [35]

    “Keras deep learning library,”

    https://faroit.github.io/keras-docs/2.0.2/.
  • [36] J. Gamblin, “Mirai source code,” Jul. 2017, https://github.com/jgamblin/Mirai-Source-Code.
  • [37] H. Xiao, B. Biggio, G. Brown, G. Fumera, C. Eckert, and F. Roli, “Is feature selection secure against training data poisoning?” in International Conference on Machine Learning, 2015, pp. 1689–1698.
  • [38] D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization,”

    IEEE transactions on evolutionary computation

    , vol. 1, no. 1, pp. 67–82, 1997.
  • [39] S. Marchal, G. Armano, T. Gröndahl, K. Saari, N. Singh, and N. Asokan, “Off-the-hook: An efficient and usable client-side phishing prevention application,” IEEE Transactions on Computers, vol. 66, no. 10, pp. 1717–1733, 2017.
  • [40] R. M. Gerdes, T. E. Daniels, M. Mina, and S. Russell, “Device identification via analog signal fingerprinting: A matched filter approach.” in NDSS, 2006.
  • [41]

    C. Arackaparambil, S. Bratus, A. Shubina, and D. Kotz, “On the reliability of wireless fingerprinting using clock skews,” in

    ACM Conference on Wireless Network Security.   ACM, 2010, pp. 169–174.
  • [42] WiFi Alliance, WiFi Simple Configuration Technical Specification.
  • [43] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in Security and Privacy (SP), 2017 IEEE Symposium on.   IEEE, 2017, pp. 39–57.
  • [44] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” arXiv preprint arXiv:1607.02533, 2016.
  • [45] I. Evtimov, K. Eykholt, E. Fernandes, T. Kohno, B. Li, A. Prakash, A. Rahmati, and D. Song, “Robust physical-world attacks on machine learning models,” arXiv preprint arXiv:1707.08945, 2017.
  • [46] T. Vaidya, Y. Zhang, M. Sherr, and C. Shields, “Cocaine noodles: exploiting the gap between human and machine speech recognition,” WOOT, vol. 15, pp. 10–11, 2015.
  • [47] G. Zhang, C. Yan, X. Ji, T. Zhang, T. Zhang, and W. Xu, “Dolphinattack: Inaudible voice commands,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.   ACM, 2017, pp. 103–117.
  • [48]

    B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” in

    Proceedings of the 29th International Coference on International Conference on Machine Learning.   Omnipress, 2012, pp. 1467–1474.
  • [49] B. I. Rubinstein, B. Nelson, L. Huang, A. D. Joseph, S.-h. Lau, S. Rao, N. Taft, and J. Tygar, “Antidote: understanding and defending against poisoning of anomaly detectors,” in Proceedings of the 9th ACM SIGCOMM conference on Internet measurement.   ACM, 2009, pp. 1–14.
  • [50] S. Shen, S. Tople, and P. Saxena, “Auror: Defending against poisoning attacks in collaborative deep learning systems,” in Proceedings of the 32Nd Annual Conference on Computer Security Applications, ser. ACSAC ’16.   ACM, 2016, pp. 508–519.
  • [51] D. Barrera, I. Molloy, and H. Huang, “IDIoT: Securing the Internet of Things like it’s 1994,” ArXiv e-prints, Dec. 2017, http://adsabs.harvard.edu/abs/2017arXiv171203623B.
  • [52] Y. J. Jia, Q. A. Chen, S. Wang, A. Rahmati, E. Fernandes, Z. M. Mao, and A. Prakash, “ContexloT: Towards providing contextual integrity to appified IoT platforms,” in 24th Annual Network & Distributed System Security Symposium (NDSS), feb 2017.
  • [53] S. Raza, L. Wallgren, and T. Voigt, “Svelte: Real-time intrusion detection in the internet of things,” Ad hoc networks, vol. 11, no. 8, pp. 2661–2674, 2013.
  • [54] W. Jardine, S. Frey, B. Green, and A. Rashid, “Senami: Selective non-invasive active monitoring for ics intrusion detection,” in Proceedings of the 2Nd ACM Workshop on Cyber-Physical Systems Security and Privacy, ser. CPS-SPC ’16, 2016, pp. 23–34.
  • [55] A. Kleinmann and A. Wool, “Automatic construction of statechart-based anomaly detection models for multi-threaded scada via spectral analysis,” in Proceedings of the 2Nd ACM Workshop on Cyber-Physical Systems Security and Privacy, ser. CPS-SPC ’16.   ACM, 2016, pp. 1–12.
  • [56] P. Garcia-Teodoro, J. Diaz-Verdejo, G. Maciá-Fernández, and E. Vázquez, “Anomaly-based network intrusion detection: Techniques, systems and challenges,” computers & security, vol. 28, no. 1-2, pp. 18–28, 2009.
  • [57] C. Kruegel and G. Vigna, “Anomaly detection of web-based attacks,” in Proceedings of the 10th ACM conference on Computer and communications security.   ACM, 2003, pp. 251–261.
  • [58] A. Lazarevic, L. Ertoz, V. Kumar, A. Ozgur, and J. Srivastava, “A comparative study of anomaly detection schemes in network intrusion detection,” in Proceedings of the 2003 SIAM International Conference on Data Mining.   SIAM, 2003, pp. 25–36.
  • [59] R. Sekar, A. Gupta, J. Frullo, T. Shanbhag, A. Tiwari, H. Yang, and S. Zhou, “Specification-based anomaly detection: a new approach for detecting network intrusions,” in Proceedings of the 9th ACM conference on Computer and communications security.   ACM, 2002, pp. 265–274.
  • [60] R. Sommer and V. Paxson, “Outside the closed world: On using machine learning for network intrusion detection,” in Security and Privacy (SP), 2010 IEEE Symposium on.   IEEE, 2010, pp. 305–316.
  • [61] P. Malhotra, L. Vig, G. Shroff, and P. Agarwal, “Long short term memory networks for anomaly detection in time series,” in Proceedings.   Presses universitaires de Louvain, 2015, p. 89.
  • [62] A. Nanduri and L. Sherry, “Anomaly detection in aircraft data using recurrent neural networks (rnn),” in Integrated Communications Navigation and Surveillance (ICNS), 2016.   IEEE, 2016, pp. 5C2–1.
  • [63] M. Du, F. Li, G. Zheng, and V. Srikumar, “Deeplog: Anomaly detection and diagnosis from system logs through deep learning,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’17.   ACM, 2017, pp. 1285–1298, http://doi.acm.org/10.1145/3133956.3134015.
  • [64] A. Oprea, Z. Li, T.-F. Yen, S. H. Chin, and S. Alrwais, “Detection of early-stage enterprise infection by mining large-scale log data,” in Dependable Systems and Networks (DSN), 2015 45th Annual IEEE/IFIP International Conference on.   IEEE, 2015, pp. 45–56.
  • [65] C. Maurice, S. Onno, C. Neumann, O. Heen, and A. Francillon, “Improving 802.11 fingerprinting of similar devices by cooperative fingerprinting,” in Proceedings of the 2013 International Conference on Security and Cryptography (SECRYPT), 2013, pp. 1–8.
  • [66] H. Bojinov, Y. Michalevsky, G. Nakibly, and D. Boneh, “Mobile device identification via sensor fingerprinting,” arXiv preprint:1408.1416, 2014.
  • [67] T. Van Goethem, W. Scheepers, D. Preuveneers, and W. Joosen, “Accelerometer-based device fingerprinting for multi-factor mobile authentication,” in Proceedings of the 8th International Symposium on Engineering Secure Software and Systems, ESSoS 2016.   Springer International Publishing, 2016, pp. 106–121.
  • [68] Y. Sharaf-Dabbagh and W. Saad, “On the authentication of devices in the Internet of Things,” in Proceedinds of the 17th IEEE International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM).   IEEE, 2016, pp. 1–3.
  • [69] I. Haider, M. Höberl, and B. Rinner, “Trusted sensors for participatory sensing and IoT applications based on physically unclonable functions,” in Proceedings of the 2Nd ACM International Workshop on IoT Privacy, Trust, and Security, ser. IoTPTS ’16.   ACM, 2016, pp. 14–21.
  • [70] N. Apthorpe, D. Reisman, and N. Feamster, “A smart home is no castle: Privacy vulnerabilities of encrypted iot traffic,” arXiv preprint arXiv:1705.06805, 2017.
  • [71] Y. Meidan, M. Bohadana, A. Shabtai, M. Ochoa, N. O. Tippenhauer, J. D. Guarnizo, and Y. Elovici, “Detection of unauthorized IoT devices using machine learning techniques,” CoRR, vol. abs/1709.04647, 2017.
  • [72] H. Aksu, A. S. Uluagac, and E. Bentley, “Identification of wearable devices with bluetooth,” IEEE Trans. on Sustainable Comput., pp. 1–1, 2018.
  • [73] H. Guo and J. Heidemann, “Ip-based iot device detection,” in Workshop on IoT Security and Privacy, 2018, pp. 36–42.
  • [74] G. Cheng, P.-C. Yip, Z. Xiao, R. Xia, and M. Wang, “Packet analysis based iot management,” oct 2016, uS Patent App. 15/087,861.

Appendix A IoT device list

Table VII presents the 33 IoT devices used during evaluation of DÏoT.

Identifier Device Model WiFi Ethernet Other Background Activity Deployment Attack
AmazonEcho Amazon Echo
AmazonEchoDot Amazon Echo Dot
ApexisCam Apexis IP Camera APM-J011
CamHi Cooau Megapixel IP Camera
D-LinkCamDCH935L D-Link HD IP Camera DCH-935L
D-LinkCamDCS930L D-Link WiFi Day Camera DCS-930L
D-LinkCamDCS932L D-Link WiFi Camera DCS-932L
D-LinkDoorSensor D-Link Door & Window sensor
D-LinkSensor D-Link WiFi Motion sensor DCH-S150
D-LinkSiren D-Link Siren DCH-S220
D-LinkSwitch D-Link Smart plug DSP-W215
D-LinkWaterSensor D-Link Water sensor DCH-S160
EdimaxCamIC3115 Edimax IC-3115W Smart HD WiFi Network Camera
EdimaxCamIC3115(2) Edimax IC-3115W Smart HD WiFi Network Camera
EdimaxPlug1101W Edimax SP-1101W Smart Plug Switch
EdimaxPlug2101W Edimax SP-2101W Smart Plug Switch
EdnetCam Ednet Wireless indoor IP camera Cube
EdnetGateway Ednet.living Starter kit power Gateway
GoogleHome Google Home
HomeMaticPlug Homematic pluggable switch HMIP-PS
HueSwitch Philips Hue Light Switch PTM 215Z
iKettle2 Smarter iKettle 2.0 water kettle SMK20-EU
Lightify Osram Lightify Gateway
Netatmo Netatmo weather station with wind gauge
SmarterCoffee Smarter SmarterCoffee coffee machine SMC10-EU
SmcRouter SMC router SMCWBR14S-N4 EU
TP-LinkPlugHS100 TP-Link WiFi Smart plug HS100
TP-LinkPlugHS110 TP-Link WiFi Smart plug HS110
UbnTAirRouter Ubnt airRouter HP
WansviewCam Wansview 720p HD Wireless IP Camera K2
WeMoInsightSwitch WeMo Insight Switch model F7C029de
WeMoLink WeMo Link Lighting Bridge model F7C031vf
WeMoSwitch WeMo Switch model F7C027de
TABLE VII: IoT devices used in the background, activity, deployment and attack datasets and their connectivity technologies

Appendix B Confusion Matrix Device-Type Identification

Table VIII is the confusion matrix obtained from device-type identification evaluation.


#01 #02 #03 #04 #05 #06 #07 #08 #09 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23
type#01 480 0 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 0 0 0 0 0 0
type#02 24 1240 0 10 3 5 0 0 16 103 0 0 0 3 50 0 26 0 0 0 0 0 0
type#03 0 0 3736 0 0 10 0 0 0 0 0 24 0 0 0 0 0 0 0 0 0 0 0
type#04 0 0 0 2775 4 39 0 0 1 42 0 14 5 0 0 0 0 0 0 0 0 0 0
type#05 0 0 0 2 8707 12 0 0 10 19 0 0 0 0 0 0 0 0 0 0 0 0 0
type#06 0 0 0 10 0 3756 0 0 6 0 0 0 0 0 0 0 0 0 0 8 0 0 0
type#07 0 10 0 0 0 2 394 0 1 0 1 2 0 0 0 0 0 0 0 0 0 0 0
type#08 0 0 0 0 0 10 0 7390 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
type#09 0 0 0 0 0 0 0 0 2619 21 0 0 0 0 0 0 0 0 0 0 0 0 0
type#10 0 0 0 0 0 0 0 0 2 7323 0 0 0 17 0 0 0 0 0 0 0 0 8
type#11 0 0 0 13 38 30 0 0 0 1 2581 4 0 0 0 3 0 0 0 0 0 0 0
type#12 0 0 0 10 0 35 0 0 0 0 0 1955 0 0 0 0 0 0 0 0 0 0 0
type#13 0 0 0 0 0 0 0 0 0 21 0 0 3470 14 0 0 5 0 0 0 0 0 20
type#14 0 3 0 9 0 0 0 0 0 39 0 0 11 350 0 0 0 0 0 10 0 0 28
type#15 0 4 0 20 2 0 0 0 0 17 0 0 0 10 1412 0 0 0 0 0 0 0 5
type#16 0 0 0 2 0 3 0 0 0 0 0 18 0 0 0 2437 0 0 0 0 0 0 0
type#17 0 20 0 0 0 0 0 0 0 6 0 0 10 0 0 0 3884 0 0 0 0 0 0
type#18 0 0 0 50 0 0 0 0 12 0 1 0 0 0 0 0 12 1715 0 0 0 0 0
type#19 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 870 0 0 0 0
type#20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 870 0 0 0
type#21 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 1710 0 17
type#22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 2 0 292 0
type#23 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 10 0 1189

TABLE VIII: Confusion matrix for device-type identification. Obtained with 10 repetitions of 4-fold cross validation. Columns represent predicted labels and rows actual labels.

Appendix C IoT Device Assignment to Types

Table IX presents the device types created during device type identification and the affectation of each IoT device to these types.

device-type IoT device
type#01 ApexisCam
type#02 CamHi
type#03 D-LinkCamDCH935L
type#04 D-LinkCamDCS930L
D-LinkCamDCS932L
D-LinkDoorSensor
D-LinkSensor
type#05 D-LinkSiren
D-LinkSwitch
D-LinkWaterSensor
type#06 EdimaxCamIC3115
EdimaxCamIC3115(2)
type#07 EdimaxPlug1101W
EdimaxPlug2101W
type#08 EdnetCam
type#09 EdnetGateway
type#10 HomeMaticPlug
type#11 Lightify
type#12 SmcRouter
type#13 TPLinkPlugHS100
TPLinkPlugHS110
type#14 UbntAirRouter
type#15 WansviewCam
type#16 WemoLink

type#17
WemoInsightSwitch
WemoSwitch
type#18 HueSwitch
type#19 AmazonEcho
type#20 AmazonEchoDot
type#21 GoogleHome
type#22 Netatmo
type#23 iKettle2
SmarterCoffee

TABLE IX: Affectation of 33 IoT devices to 23 DÏoT device types during evaluation (Sect. VIII)

Appendix D Examined Mirai Attack Vectors

Table X shows the different attack scenarios used in collecting the attack dataset.

Scenario Description
scanning Only scanning enabled
udp UDP flood
syn SYN flood
ack ACK flood
udpplain UDP flood with less options
vse Valve source engine specific flood
dns DNS resolver flood
greip GRE IP flood
greeth GRE Ethernet flood
http HTTP flood

Source: https://www.nanog.org/sites/default/files/1_Winward_Mirai_The_Rise.pdf

TABLE X: Attack scenarios in the attack dataset.