Early Detection Of Mirai-Like IoT Bots In Large-Scale Networks Through Sub-Sampled Packet Traffic Analysis

01/15/2019 ∙ by Ayush Kumar, et al. ∙ National University of Singapore 0

The widespread adoption of Internet of Things has led to many security issues. Recently, there have been malware attacks on IoT devices, the most prominent one being that of Mirai. IoT devices such as IP cameras, DVRs and routers were compromised by the Mirai malware and later large-scale DDoS attacks were propagated using those infected devices (bots) in October 2016. In this research, we develop a network-based algorithm which can be used to detect IoT bots infected by Mirai or similar malware in large-scale networks (e.g. ISP network). The algorithm particularly targets bots scanning the network for vulnerable devices since the typical scanning phase for botnets lasts for months and the bots can be detected much before they are involved in an actual attack. We analyze the unique signatures of the Mirai malware to identify its presence in an IoT device. Further, to optimize the usage of computational resources, we use a two-dimensional (2D) packet sampling approach, wherein we sample the packets transmitted by IoT devices both across time and across the devices. Leveraging the Mirai signatures identified and the 2D packet sampling approach, a bot detection algorithm is proposed. We use testbed measurements and simulations to study the relationship between bot detection delays and the sampling frequencies for device packets. Subsequently, we derive insights from the obtained results and use them to design our proposed bot detection algorithm. Finally, we discuss the deployment of our bot detection algorithm and the countermeasures which can be taken post detection.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Internet of things (IoT)[1] refers to the network of low-power, limited processing capability sensing devices which can send/receive data to/from other devices using wireless technologies such as RFID (Radio Frequency Identification), Zigbee, WiFi, Bluetooth, 3G/4G etc. IoT devices are being deployed in a number of applications such as wearables, home automation, smart grids, environmental monitoring, infrastructure management, industrial automation, agricultural automation, healthcare and smart cities. Some of the popular platforms for IoT are Samsung SmartThings (consumer IoT for device management) and Amazon Web Services IoT, Microsoft Azure IoT, Google Cloud Platform (enterprise IoT for cloud storage and data analytics). The number of IoT devices deployed globally by 2020 is expected to be in the range of 20-30 billion [2]. The number of devices has been increasing steadily (albeit at a slower rate than some earlier generous predictions), and this trend is expected to hold in the future.

IoT devices are being increasingly targeted by hackers using malware (malicious software) as they are easier to infect than conventional computers for the following reasons[3, 4, 5]:

  • There are many legacy IoT devices connected to the Internet with no security updates.

  • Security is given a low priority within the development cycle of IoT devices.

  • Implementing conventional cryptography in IoT devices is computationally expensive due to processing power and memory constraints.

  • Many IoT devices have weak login credentials either provided by the manufacturer or configured by users.

  • IoT device manufacturers sometimes leave backdoors (such as an open port) to provide support for the device remotely.

  • Often, consumer IoT devices are connected to the Internet without going through a firewall.

In a widely publicized attack, the IoT malware Mirai was used to propagate the biggest DDoS (Distributed Denial-of-Service) attack on record on October 21, 2016. The attack targeted the Dyn DNS (Domain Name Service) servers [6] and generated an attack throughput of the order of 1.2 Tbps. It disabled major internet services such as Amazon, Twitter and Netflix. The attackers had infected IoT devices such as IP cameras and DVR recorders with Mirai, thereby creating an army of bots (botnet) to take part in the DDoS attack. Apart from Mirai, there are other IoT malware which operate using a similar brute force technique of scanning random IP addresses for open ports and attempting to login using a built-in dictionary of commonly used credentials. BASHLITE [7], Remaiten [8], Hajime [9] are some examples of these IoT malware.

Bots compromised by Mirai or similar IoT malware can be used for DDoS attacks, phishing and spamming [10]. These attacks can cause network downtime for long periods which may lead to financial loss to network companies, and leak users’ confidential data. McAfee reported in April 2017[11] that about 2.5 million IoT devices were infected by Mirai in late 2016. Bitdefender mentioned in its blog in September 2017[12]

that researchers had estimated at least 100,000 devices infected by Mirai or similar malware revealed daily through telnet scanning telemetry data. Further, many of the infected devices are expected to remain infected for a long time. Therefore, there is a substantial motivation for detecting these IoT bots and taking appropriate action against them so that they are unable to cause any further damage.

As pointed out in [13], attempting to ensure that all IoT devices are secure-by-construction is futile as there will always be insecure devices (with patched and unpatched vulnerabilities) connected to the Internet due to the scale and diversity of IoT devices and vendors. Moreover, considering the lack of full-fledged operating systems, low power requirements, resource constraints and presence of legacy devices, it is practically unfeasible to deploy traditional host-based detection and prevention mechanisms such as antivirus, firewalls for IoT devices. Therefore, it becomes imperative that the security mechanisms for the IoT ecosystem are designed to be network-based rather than host-based.

In this research, we propose a network-based algorithm which can be used to detect IoT bots infected by Mirai-like malware (which use port-based scanning) in large-scale networks. Bots scanning the network for vulnerable devices are targeted in particular by our algorithm. This is because the scanning and propagation phase of the botnet life-cycle stretches over many months and we can detect and isolate the bots before they can participate in an actual attack such as DDoS. If the DDoS attack has already occurred (due to a botnet), detecting the attack itself is not that difficult and there are already existing methods both in literature and industry to defend against such attacks. Moreover, our algorithm is practical in terms of utilization of computational resources (such as CPU processing power, memory). For example, ISP (Internet Service Provider) network operators can use the proposed algorithm to identify infected IoT devices in their network. The operators can then take suitable countermeasures such as blocking the traffic originating from IoT bots and notifying the local network administrators. Actions that can be taken post bot detection are further discussed in a later section. The major contributions of this paper are listed below:

  1. We have analyzed the traffic signatures produced by Mirai malware infecting IoT devices through testbed experiments. Further, we have identified specific signatures which can be used to positively detect the presence of Mirai and similar malware in IoT devices. These signatures are similar to the observations reported in [14] based on their analysis of the Mirai source code.

  2. We have proposed an algorithm to detect Mirai-like IoT malware bots in large-scale networks. The algorithm is based on a novel two dimensional sampling approach where the device packets are sampled across time as well as across the devices.

The rest of the contents of this paper are organized as follows. In Section 2, we review few prominent works on detecting botnets exploiting CnC communication features and intrusion detection systems for IoT. Subsequently, in section 3, we explain the operation of Mirai, extract important features from the traffic generated by Mirai bots in a testbed and present a detailed analysis of those features towards detecting Mirai-like bots. Section 4 formulates the optimization problem resulting from detection of IoT bots in large-scale networks along with the constraints imposed by limited computational resources followed by the proposed bot detection algorithm. The algorithm is numerically evaluated and the results are presented in section 5. Finally, in section 6, the implementation of the proposed IoT bot detection algorithm in a real-world network is discussed as well as the mitigating actions that can be taken post detection.

2 Related Work

There are several works in the literature on detecting botnets using their CnC communication features. We list a few prominent ones in this section. The authors in [15]

present machine-learning based classification methods to detect CnC traffic of IRC (Internet Relay Chat) botnets by differentiating between IRC and non-IRC traffic and then differentiating between bot and real IRC traffic. Bothunter

[16] builds a bot infection dialog model using the network communication flows between internal hosts and external entities during successful bot infections. Three bot-specific sensors are constructed based on the dialog model and correlation is performed between inbound intrusion/scan alarms and the infection dialog model to generate a consolidated report. Spatio-temporal similarities between bots in a botnet in terms of bot-CnC coordinated activities are captured from network traffic and leveraged towards botnet detection in a local area network in Botsniffer[17]. In BotMiner[18], the authors have proposed a botnet detection system which clusters similar CnC communication traffic and similar malicious activity traffic, and uses cross cluster correlation to detect bots in a monitored network. It is claimed to be independent of CnC protocol and structure with no requirement of a priori knowledge about the botnets. A system for detecting covert P2P (Peer-to-Peer) botnets has been proposed in [19]. After extracting the statistical CnC communication features for P2P botnets, the botnet detection system utilizes them to distinguish between legitimate and malicious P2P traffic.

There has also been some research on intrusion detection and anomaly detection systems for IoT. A whitelist-based intrusion detection system for IoT devices (Heimdall) has been presented in

[20]. Heimdall is based on dynamic profile learning and is designed to work on routers acting as gateways for IoT devices. The authors in [21] propose an intrusion detection model for IoT backbone networks leveraging two-layer dimension reduction and two-tier classification techniques to detect U2R (User-to-Root) and R2L (Remote-to-Local) attacks. In a recently published paper [22]

, deep-autoencoders based anomaly detection has been used to detect attacks launched from IoT botnets. The method consists of extraction of statistical features from behavioral snapshots of normal IoT device traffic captures, training of a deep learning-based autoencoder (for each IoT device) on the extracted features and comparison of the reconstruction error for traffic observations with a threshold for normal-anomalous classification. The proposed detection method was evaluated on Mirai and BASHLITE botnets formed using commercial IoT devices.

While a number of above anomaly detection works leverage ML (machine learning)-based approaches, there are several issues associated with them[23]. One of the major issues is the occurrence of false positives. Even a small percentage of false positives, e.g. 1% which is considered acceptable in academic research on anomaly detection, can lead to thousands of alerts per day based on the traffic volume processed[24]. Both false positives and false negatives have costs (e.g. financial expenses for an organization) associated with them, with the cost associated with false negatives typically being much higher. Second, many research works on anomaly detection using ML fail to explain why a particular ML algorithm would perform well in the system under consideration. Third, many ML algorithms are suitable for offline batch operations rather than low-latency real-time detection. Finally, instead of starting with the premise of using ML approach for a detection task which is a common flaw in anomaly detection research, one should carry out a neutral evaluation of all the available tools for the task and then decide on the most appropriate one.

Our work addresses a few important gaps in the literature when it comes to distinguishing between legitimate and botnet IoT traffic. First, almost all the works cited above on detecting botnets using their CnC communication features [15, 16, 17, 18] utilize all the packets transmitted by all the devices in a monitored network for a specific time period towards designing a botnet detection solution. This approach is highly impractical if the resulting solution is to be deployed for IoT devices in real world networks. The reason is that processing all the packets for all devices in a large network would require a lot of computational resources as illustrated in Section 4.1. Second, our focus is not only on detecting bots employing IRC-based CnC communications as done in [15]. The bot detection algorithm proposed by us in Section 4.1 is independent of the bot-CnC communication protocol. Third, we do not aim to detect botnets (networks of bots) but instead, individual bots. Therefore, we don’t require computationally expensive clustering algorithms as used in [17, 18].

Fourth, we do not extract CnC communication features and use them to identify bot-CnC communications as done in [17, 18, 19]. This is because we aim to detect bots infected by Mirai-like IoT malware, towards which much simpler features can be used as discussed in Section 3.3. Fifth, unlike [22], we aim to detect IoT bots much before the actual attack, during the scanning phase itself as explained in Section 4. Finally, most of the above cited works use quantifiers such as detection rate and false positive rates to evaluate the performance of their proposed botnet detection solutions. Instead, we use a quantity called average detection delay (defined in Section 4.1) for the performance evaluation of our proposed bot detection solution since the features used by our solution eliminate the possibility of inaccurate detections or false positives. To the best of our knowledge, there are no existing papers on detecting IoT bots compromised by Mirai or its variants which exhibit port-based SYN scanning behavior.

3 Mirai Traffic Analysis

Detecting IoT devices compromised by Mirai-like malware requires us to analyze the packet traffic generated by those devices and extract some features to aid us in detection. In this section, we begin with a brief description the operation of Mirai to make the readers familiar with some of the related terms. Later, we present a testbed that we use to emulate IoT devices, infect them with Mirai and capture the packet traffic generated from them. Finally, we present the extracted features and analyze them in detail with respect to identifying Mirai bots.

3.1 Mirai Operation

The Mirai [25] setup consists of three major components: bot, scanListen/loading server, and the CnC (Command-and-Control) server. The CnC server also functions as a MySQL[26] database server. User accounts can be created in this database for customers who wish to hire DDoS-as-a-service. The operation of Mirai is illustrated in Fig. 1. Once an IoT device is infected with Mirai (and becomes a bot), it first attempts to connect to the listening CnC server by resolving its domain name and opening a socket connection. Thereafter, it starts scanning the network by sending SYN packets to random IP addresses and waiting for them to respond. This process may take a long time since the bot has to go through a large number of IP addresses. Once it finds a vulnerable device with a TELNET port open, it attempts to open a socket connection to that device and emulates the TELNET protocol. Then it attempts to login using a list of default credentials and if working credential is found, it reports the IP address of the discovered device and the working TELNET login credentials to the listening scanListen server. The scanListen server sends that information to the loader which again logs in to the discovered device using the details received from the scanListen server. Once logged in, the loader downloads the Mirai bot binary to that device and the new bot connects to the CnC server and starts scanning the network.

Figure 1: Operation of various components of Mirai (Source: Radware [27])

3.2 Testbed Description

The testbed shown in Fig. 2 was configured on an isolated computing cluster. Each cluster node has two Intel Xeon E5-2620 processors, 64 GB DDR4 ECC memeory and runs Ubuntu 14.04 LTS standard image. The testbed consists of a local authoritative DNS server, a CnC (Command-and-Control) server and a server for scanListen and loading utility, all connected to a single LAN. The IoT gateways are connected to the above LAN through routers and behind the gateways are QEMU[28]-emulated IoT devices (Raspberry Pi). We chose this gateway-IoT device topology since it is used in a number of IoT deployments (such as IP cameras, smart lighting devices, wearables etc.). The testbed also includes few non-IoT devices (PCs) to reflect real-world networks. As per our information, this is the first controlled testbed to simulate the true behavior of Mirai malware. It can be modified to add more nodes, study a different network topology and test more advanced versions or derivatives of Mirai malware.

Figure 2: Testbed used to simulate Mirai behavior

3.3 Mirai Traffic Features

We infected the emulated IoT devices in our testbed with Mirai and captured a total of 1,583,623 packets transmitted by the devices. An analysis of the captured packets reveals the following features/signatures:

  • The scanning packets are all TCP SYN (synchronization) packets.

  • The destination port numbers of scanning packets are distributed as 90% port 23 and 10% port 2323. No other port numbers are observed.

  • There is a periodic exchange of keep alive messages (PSH+ACK) between the bot and the CnC server. PSH refers to a push message and ACK refers to acknowledgement.

Both ports 23 and 2323 are assigned for TELNET applications[29, 30]. The TELNET[31] protocol is used for bidirectional byte-oriented communication. In the most widely used implementation of TELNET, a user with a terminal and running a TELNET client program, accesses a remote host running a TELNET server by requesting a connection to the remote host and logging in by providing its credentials. The most common application of TELNET is for configuring network devices such as routers. Now, IoT devices operate by continuously transmitting sensed data to and receiving commands from cloud servers through a gateway over a secure communication channel without external human input[32]. We claim that an IoT device is unlikely to be used to access or configure another device using TELNET, and therefore in the absence of malware infection, IoT devices should not open TELNET connections to any other device.

To verify our claim that uninfected IoT devices are not expected to open TELNET connections, the following experiment was conducted. We configured a Raspberry Pi 3 (Model B+) to act as a gateway and connected it to several real-world IoT devices such as IP cameras (D-Link), motion sensors (D-Link), smart bulbs (Philips Hue), smart switches (WeMo) and smart plugs (TPLink). We left the devices connected for a long time and for each device type mentioned above, we captured around 10,000 packets per device at the gateway interface. Later, the captured packets were analysed using Wireshark [33] and no SYN packets with destination ports 23 or 2323 were found. Thus, if a SYN packet from an IoT device with destination port number 23 or 2323 is received, it is sufficient evidence to conclude with certainty that the IoT device is infected with a Mirai-like malware. The above experiment also help us to rule out false positives, if any at all, if we use the identified scanning traffic signatures, which is a substantial advantage when it comes to practical intrusion detection.

The third Mirai signature related to keep-alive messages is not required since the port-scanning signatures is sufficient for detection with certainty. We may require the third signature to detect more advanced malware which do not use TELNET port-based scanning. It needs to be emphasized here that the TELNET port-scanning signatures can be used to identify not only bots infected by Mirai but also other Mirai-like malware such as BASHLITE, Remaiten, Hajime etc. which employ similar TELNET port brute forcing technique.

4 Mirai-like IoT Malware Bot Detection

The bot scanning traffic analyzed in the previous section cannot be detected using simple firewalls. Since IoT devices are usually resource-constrained, they do not have firewalls installed on them. Moreover, network-level firewalls (protecting computers in a LAN/WAN/intranet) are not configured to block TELNET traffic which in most cases may be legitimate. In this section, we formulate the optimization problem arising out of detecting IoT bots in large-scale networks with the accompanying computational resource constraints. Further, we propose an algorithm for bot detection based on our analysis.

4.1 Formulation of Optimization Problem

Even though receiving a SYN packet with destination port number 23 or 2323 in its TCP (Transmission Control Protocol) header is sufficient to identify the transmitting IoT device as infected, we cannot strip off the TCP headers and check the encapsulated TCP flags and destination port numbers for all the packets transmitted by all the IoT devices in a network as this would require a lot of computational resources (both processing power and memory) from the bot detection device that captures and processes the IoT device packets. To give an example, the total number of IoT devices being used in the U.S. stands at 715 million[34]. Given that there are 12 major ISPs operating in U.S.[35], assuming equal number of IoT devices being used in each ISP network yields 59.58 million devices per ISP. IEEE 802.15.4 standard [36] which forms the basis of most IoT communication protocols allows a peak data rate of 250 kbit/s. The peak total IoT device data rate for an ISP network can thus be estimated as 14,895 billion bits/s (IoT devices are considered to be always ON once installed).

To send or receive 1 bit/s of TCP/IP, 1Hz of processing speed is required as a general rule of thumb. However, as shown in [37], this rule doesn’t always hold and the Hz/bps ratio increases upto 6-7 times for small data transfers (payload size of the order of 64 bytes) as compared to larger transfers. In fact, the maximum payload size allowed by IEEE 802.15.4 link headers is 81 bytes. Further, the Hz/bps ratio increases when one goes to higher CPU speeds. Now, socket receive processing can take upto 23% of the total processing required for TCP processing for small transfer sizes. Hence, the processing speed required for socket operations of TCP flag and destination port lookup at just 10% of the estimated peak total IoT device data rate for an ISP network can be calculated as 2,398 GHz. This translates to a requirement of nearly 480 additional 2.5GHz dual-core processors for a single ISP, just for detecting IoT bots. This represents a significant investment from ISP companies. Moreover, as the number of devices is increasing steadily with time, the above investment is only bound to grow.

Therefore, for our bot detection problem, we propose to sample only a fraction of the IoT devices per unit time for TCP processing. This will reduce the number of IoT packets that need to be processed by the TCP stack, bringing down the the computational resources required. However, this approach has the drawback that we may miss the scanning packets due to the sub-sampling operation. This leads to the formulation of the following optimization problem to detect infected devices.

Our objective in this optimization problem is to minimize the cost associated with the delay in detecting a compromised device. We define average detection delay () as the average time between the first occurrence of a scanning packet and the positive conclusion that the originating device is infected. Now, some IoT devices in a network are easier to infect with malware than others. Therefore, we split the IoT devices into two categories: vulnerable and non-vulnerable devices. Vulnerable devices are the devices which are easier to get successfully infected with Mirai-like malware and added to the botnet. The devices other than vulnerable ones are non-vulnerable devices. For example, personal IoT devices installed at homes can be deemed as vulnerable since they are less likely to be behind a firewall (host-level firewalls not feasible on IoT devices due to resource constraints) and more likely to have their TELNET ports open (often owners buy cheap devices in which the manufacturer has left TELNET port open for remote configuration etc.). IoT devices installed in enterprise/industrial/government networks can be categorized as non-vulnerable since most likely, they would be behind a network-level firewall (blocking access to insecure TELNET connections) and they are much less likely to have to have their TELNET ports open (due to organizational IT security policies).

We define the sampling frequency for an IoT device as the fraction of the time when that device is selected for monitoring for possible infection. We also define the sampling matrix, as a matrix with columns representing devices and rows representing the packets transmitted by those devices. An element of is equal to when the corresponding packet has been sampled and equal to when the corresponding packet has not been sampled.

Further, our optimization problem imposes the following constraints that need to be satisfied:

  • The sampling frequency for a vulnerable device () should be greater than the sampling frequency for a non-vulnerable device (). This is because vulnerable devices are more likely to be attacked than non-vulnerable devices and hence they need to be more frequently monitored.

  • The total number of vulnerable and non-vulnerable devices selected within a certain time period () should not exceed a maximum number (), where and are the fractions of total number of devices that and vulnerable and non-vulnerable respectively. This is to limit the utilization of computational resources for if the total number of selected devices is more than an upper bound, it may require significant amounts of processing power defeating the purpose of packet sub-sampling.

  • The maximum number of vulnerable devices selected at any time should have an upper bound (). Similarly, the maximum number of non-vulnerable devices selected at any time should have an upper bound (). This is again to place a bound on computational resources utilization.

  • After a certain number of sampling time units (), every device (in the set of all devices, ) should be covered by the sampling process. This is to ensure that every device is checked for malware infection within a certain time duration or else few devices which are infected may be missed by the sampling process.

We propose to minimize the cost associated with the average detection delay while satisfying the above constraints as follows:

subject to

where is defined as the cost incurred by the bot detection algorithm due to a unit average detection delay, denote the number of vulnerable and non-vulnerable devices selected in at any point of time, is the set of vulnerable devices, is the set of non-vulnerable devices, and is a function that outputs the set of devices sampled in at a time . It is to be noted that the above optimization problem is a combinatorial one and it is computationally hard to find an optimal solution[38]. Hence, we devise a method to numerically solve the optimization problem. The results obtained from the numerical analysis are explained in Section 5. Based on our findings through the formulation of optimization problem, we have proposed an algorithm for detecting IoT bots (shown in Algorithm 1) which is practical in terms of lower number of packets that need to be monitored for infected device detection. The values for and to be used while designing our algorithm will be discussed in our numerical analysis.

1:Initialize , NUM_PKTS, t.
2:for  to NUM_PKTS do
3:     if src_dev(recv_pkt) list_dev then
4:         add_dev_to_list(src_dev(recv_pkt),list_dev)
5:     end if
6:     add_pkt_to_buf(recv_pkt, dev_buf(src_dev(recv_pkt))
7:     pktcntpktcnt+1
8:end for
9:while TRUE do
10:     sel_dev_setdev_set(,t)
11:     for  to length(sel_dev_set) do
12:         sampled_pkts(t,:)=dev_buf(sel_dev_set(i), CURRENT_PKT)
13:     end for
14:     for  to length(sampled_pkts(t,:)) do
15:         if Check_TCP_flag(sampled_pkts(t,j)) SYN & Check_dst_port(sampled_pkts(t,j)) 23 OR 2323 then
16:              Bot_detected(src_dev(sampled_pkts(t,j))) TRUE
17:         end if
18:     end for
19:     tt+1
20:end while
Algorithm 1 IoT Bot Detection Algorithm

5 Evaluation of Proposed Algorithm

In this section, we analyze the the behavior of average detection delay for vulnerable and non-vulnerable devices with varying sampling rates. A few important background details are presented below:

  • The set of attacked devices,

    is selected based on the assumed probability model for malware attack on vulnerable and non-vulnerable devices. For example, we can assume the probability of attack on vulnerable devices within a given time duration (

    packets’ transmission) as and that on non-vulnerable devices as .

  • The sampling matrix, used in our evaluation has a staggered structure and may be visualized as in Fig. 3. Since the sampling frequency for vulnerable devices is greater than that for non-vulnerable devices, the portion of containing packets transmitted by vulnerable devices has a more dense distribution of s than that for non-vulnerable devices. The structure of the matrix also ensures that every device is sampled after a certain number of sampling time units as required by one of the constraints in the optimization problem presented in Section 4.1.

    Figure 3: Sampling matrix example
  • We form a scanning matrix with size as (number of IoT devices) (number of packets transmitted). The matrix uses to represent a normal IoT device packet and to represent a malware scanning packet. Only the devices in would have s in their corresponding rows in the scanning matrix.

  • The elements where the scanning and the sampling matrices are both represent detected scanning packets. This is because the matching elements would only be present where the scanning packet transmitted by an attacked device has been selected by the sampling process.

Moreover, we need to form a statistical model for scanning packet arrivals in the scanning matrix. Towards this, we used one of our emulated IoT devices and established a video streaming server to simulate the operation of an IP camera (IoT device used in Mirai attack on Dyn). Another emulated IoT device acted as a client connected to the video stream. The other emulated devices were configured to have their TELNET port number 23 open and listening for connections. Subsequently, we infected the video streaming device with Mirai and captured the transmitted packets at its gateway interface using Wireshark. Our observations from the packet capture are listed below:

  • The video streaming packets are transmitted almost continuously. The transmission is interrupted only by bot-CNC server communication packets, scanning packets and some other types of packets such as ARP (Address Resolution Protocol).

  • The bot scanning packets are sometimes transmitted within short intervals and at other times they are transmitted far apart as shown in Fig 4.

Based on the above empirical observations, we model the scanning packet arrivals as a Poisson process, i.e., the inter-packet arrival times for scanning packets are exponentially distributed with the average packet arrival rate calculated from the testbed measurements. At all other times, we assume that normal IoT traffic is transmitted, again based on above observations.

Figure 4: Arrival times of scanning packets
Parameter Value
40
80
0.5
50
Total no. of IoT devices 100
% age of vulnerable devices 40
No. of packets transmitted per device 100,000
Avg. rate of arrival of scanning packets 3386
(per packet elapsed)
Table 1: Parameter values assumed in numerical analysis

The values assumed for the various parameters in our analysis are shown in Table 1. The plot for average detection delay vs sampling frequency for different values of attack probability on vulnerable devices () is shown in Fig. 5. The detection delay values are averaged over all the detected devices as well as over a number of trial runs (1000). The units of average detection delay are in number of packets elapsed while the units of sampling frequency are in per packet elapsed. It can be observed that the average detection delay decreases almost exponentially with increasing sampling frequency. This behavior can be intuitively explained as follows. Increasing the sampling frequency means that the vulnerable devices are sampled much more frequently, which in turn increases the likelihood of sampling the scanning packets transmitted by infected vulnerable devices. Once a scanning packet is sampled, it can be positively concluded that the corresponding source device is infected as discussed in section 3.3. Hence, an increase in the likelihood of sampling scanning packets should lead to a decrease in the average detection delay as defined in section 4.1. Further, it can also be noted from the plot that increasing the sampling frequency beyond a certain value (e.g. ’0.33’ for ) leads to slower reduction in average detection delay. This suggests that while designing the proposed Algorithm 1, the sampling frequency for vulnerable devices should be selected towards the upper half of the range of available values but not too high since higher sampling frequencies will not result in more benefit in terms of decrease in average detection delay. Instead, sampling frequencies which are too high may lead to greater consumption of computational resources.

One may observe that the average detection delay values decrease slightly as the attack probability increases. This is expected since an increase in attack probability means that more number of vulnerable devices are likely to be infected, thus increasing the likelihood of sampling the scanning packets transmitted by those infected devices resulting in a decrease in average detection delay. Lastly, the plots for the three attack probabilities, , are quite close to each other, suggesting that changes in attack probability do not affect the average detection delay vs sampling frequency behavior significantly.

In Fig. 6, we have illustrated the distribution of average detection delays for vulnerable devices for a sampling frequency of 0.2 and attack probability of 0.6 using a histogram. The distribution closely fits an exponential distribution with a mean of , suggesting that the probability of achieving higher and higher average detection delays for vulnerable devices decreases almost exponentially. Vulnerable devices are sampled at a relatively higher frequency and also have a higher probability of being infected than non-vulnerable devices. Therefore, scanning packets can be detected with lower delays in most trials, resulting in higher probability for lower values and lower probabilities for higher values of average detection delays.

Figure 5: Average detection delay vs sampling frequency plot for vulnerable devices
Figure 6: Histogram of Average detection delays for vulnerable devices

In Fig. 7, we have presented the plot for average detection delay vs sampling frequency for different values of attack probability on non-vulnerable devices (). The plot behavior is somewhat irregular near lower sampling frequencies. For higher sampling frequencies, the average detection delay can be observed to decrease almost linearly with increasing sampling frequency. The intuitive explanation for the decreasing behavior is similar to the one given above for vulnerable devices. While designing the proposed Algorithm 1, a sampling frequency for non-vulnerable devices which is too high may lead to lower average detection delay but the corresponding increase in processing power and memory requirements may not be desirable since non-vulnerable devices are not expected to be compromised easily. A sampling frequency which is too low on the other hand, may increase the average detection delay significantly in the unexpected scenario when some of the non-vulnerable devices are compromised. Therefore, the algorithm designers may have to settle for a sampling frequency which falls in the mid of the range of available values. Fig. 8 shows the distribution of average detection delays for non-vulnerable devices for a sampling frequency of 0.025 and attack probability of 0.2 using a histogram. The distribution assumes the highest values for average detection delays between ‘0-10,000’. Thereafter, values taken by the distribution decrease slowly with increasing average detection delays.

Figure 7: Average detection delay vs sampling frequency plot for non-vulnerable devices
Figure 8: Histogram of Average detection delays for non-vulnerable devices

6 Implementation of IoT Bot Detection Algorithm

As mentioned earlier in section 4.1, our proposed algorithm for bot detection has to be run on some special bot detection devices within a given network. These sentinel (monitoring) devices should have enough processing power and memory to run the bot detection algorithm for a large number of IoT devices. The sentinel devices can be placed higher up the network hierarchy, but below the core network routers, to make maximum use of the sub-sampling approach employed by our proposed algorithm. We propose that a sentinel device should monitor only the IoT devices connected to a few access network routers, which implies that an ISP network would require multiple sentinel devices to monitor all the IoT devices in that network.

We also need to processs only IoT device packets at the sentinel devices, whereas the network traffic consists of IoT as well as non-IoT traffic (PCs, smartphones etc.) The authors in [39]

distinguish between traffic generated by IoT and non-IoT devices from a single TCP session by analyzing user-agent HTTP property for smartphones and single-session binary classifiers for PCs. A classification accuracy of 100% for smartphones and false positive, negative rates of 0.003 each for PCs were claimed to be achieved. We can use their methods to distinguish between IoT and non-IoT device packets using a single session worth of packets. Further, once we identify a device as belonging to IoT or non-IoT type, we can continue to use this information in the future as the device type is not expected to change.

It is assumed that the ISPs already have access to the information regarding vulnerable and non-vulnerable devices. As explained earlier in Section 4.1, IoT devices installed in home environments can be regarded as vulnerable while the devices installed in enterprise/industrial/government networks can be deemed as non-vulnerable. The routers can be configured to forward copies of received packets as well as the corresponding source device IP addresses to the sentinel devices. The incoming packets at sentinel devices can be arranged and stored in buffers according to their source devices. Fig. 9 shows a prospective network deployment for our proposed algorithm illustrating the path of an IoT device packet as it originates from an IoT device, passes through IoT gateway, network routers and sentinel devices. We expect the firmware running on sentinel devices to be upgradeable so that in future, if more advanced bot detection algorithms are designed (e.g. for IoT malware which do not rely on port based scanning), the corresponding software updates can be easily pushed to the sentinel devices. We cannot run our algorithm on existing core network routers since they strip a packet only until its IP (Internet Protocol) header whereas the destination port numbers are encapsulated within the TCP header.

Once the bots are detected by our proposed algorithm, the next step is to take mitigating actions to prevent the bots from spreading further damage. The network administrator can block the entire traffic originating from bots and bring them back online only after it is confirmed that the malware has been removed from those IoT devices. The concerned ISP can inform the device owners and ask them to secure their device (by using strong usernames/passwords, placing the device behind a firewall etc.). Another defense mechanism is that instead of blocking all the traffic, the bot can be allowed communications with a few secure domains for remediation of malware infection. This strategy has been mentioned as part of the bot remediation techniques[40] recommended for ISPs by IETF (Internet Engineering Task Force). The bot can also be placed under continuous monitoring and all other communication except that required for the underlying IoT device to function can be denied. Finally, security personnel can exploit bugs in the bot binary to disinfect them remotely.

Figure 9: Prospective network deployment for proposed bot detection solution

7 Future Work

We are developing a software prototype of the proposed bot detection algorithm [41] which will be evaluated on an extended version of our Mirai testbed (Fig 2) emulating a real-world network of connected IoT and non-IoT devices, gateways, routers and the proposed sentinel devices. It is not possible to test our algorithm with a network of physical devices as we would require hundreds of thousands of IoT devices in addition to gateways, routers and other networking equipment to replicate real-world large-scale networks. Further, we are also looking at the optimal placement of sentinel devices in a network by performing a cost analysis. In the future, we would like to develop solutions for detecting IoT bots infected with malware exploiting software vulnerabilities to hack the devices and add to the botnet. For instance, Linux.Darlloz, Reaper and Amnesia malware [42, 43, 44] use HTTP (Hyper Text Transfer Protocol)-based exploits to perform code injection and arbitrarily execute code on remote devices bypassing authentication. It should be noted here that the packet sub-sampling approach proposed in this paper is likely to be a part of the bot detection solution devised for such advanced malware. Finally, some malware may try to evade detection, e.g. by attempting to hide their scanning activity. It would be an interesting problem to detect such evasive IoT malware.

8 Conclusion

In this paper, we proposed an algorithm for detecting IoT devices infected by Mirai or similar malware. The bot detection algorithm uses Mirai traffic signatures and a two-dimensional sub-sampling approach. Leveraging measurements taken from a testbed constructed to simulate the behavior of Mirai, we studied the relationship between average detection delays and sampling frequencies for vulnerable and non-vulnerable devices. Based on our analysis of the plots, we made suggestions regarding the process of selection of sampling frequencies while designing our proposed algorithm. Subsequently, the deployment of our bot detection algorithm within a real-world network was discussed where we proposed using special sentinel devices to run the algorithm. Prospective actions which can be taken after detection of bots were also mentioned. Finally, we identified few interesting problems stemming out of this research which we would like to work upon in the future.

Acknowledgment

The authors would like to thank Dr. Liang Zhenkai (SoC, NUS) for helping us with some of the initial ideas used in this paper and Dr. Min Suk Kang (SoC, NUS) for providing comments on our manuscript. We would also like to appreciate the National Cybersecurity R&D Lab, Singapore for allowing us to use their testbed to collect important data which has been used in our work. This research is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its Corporate Laboratory@University Scheme, National University of Singapore, and Singapore Telecommunications Ltd.

References