Collaborative adversary nodes learning on the logs of IoT devices in an IoT network

by   Sandhya Aneja, et al.

Artificial Intelligence (AI) development has encouraged many new research areas, including AI-enabled Internet of Things (IoT) network. AI analytics and intelligent paradigms greatly improve learning efficiency and accuracy. Applying these learning paradigms to network scenarios provide technical advantages of new networking solutions. In this paper, we propose an improved approach for IoT security from data perspective. The network traffic of IoT devices can be analyzed using AI techniques. The Adversary Learning (AdLIoTLog) model is proposed using Recurrent Neural Network (RNN) with attention mechanism on sequences of network events in the network traffic. We define network events as a sequence of the time series packets of protocols captured in the log. We have considered different packets TCP packets, UDP packets, and HTTP packets in the network log to make the algorithm robust. The distributed IoT devices can collaborate to cripple our world which is extending to Internet of Intelligence. The time series packets are converted into structured data by removing noise and adding timestamps. The resulting data set is trained by RNN and can detect the node pairs collaborating with each other. We used the BLEU score to evaluate the model performance. Our results show that the predicting performance of the AdLIoTLog model trained by our method degrades by 3-4 the presence of attack in comparison to the scenario when the network is not under attack. AdLIoTLog can detect adversaries because when adversaries are present the model gets duped by the collaborative events and therefore predicts the next event with a biased event rather than a benign event. We conclude that AI can provision ubiquitous learning for the new generation of Internet of Things.



There are no comments yet.


page 1


Multi-Layer Perceptron Artificial Neural Network Based IoT Botnet Traffic Classification

Internet of Things (IoT) is becoming an integral part of our homes today...

How Can Applications of Blockchain and Artificial Intelligence Improve Performance of Internet of Things? – A Survey

In the era of the Internet of Things (IoT), massive computing devices su...

Gateway Controller with Deep Sensing: Learning to be Autonomic in Intelligent Internet of Things

The Internet of Things(IoT) will revolutionize the Future Internet throu...

IoT DoS and DDoS Attack Detection using ResNet

The network attacks are increasing both in frequency and intensity with ...

Intelligent Traffic Light Control Using Distributed Multi-agent Q Learning

The combination of Artificial Intelligence (AI) and Internet-of-Things (...

BWCNN: Blink to Word, a Real-Time Convolutional Neural Network Approach

Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative d...

DANTE: A framework for mining and monitoring darknet traffic

Trillions of network packets are sent over the Internet to destinations ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The Internet of Things ((IoT) devices are resource-constrained low power devices collecting a large volume of data for IoT applications in healthcare, retail, transportation and manufacturing. The data collected through IoT applications is valuable and huge. Many malwares, and botnets have been observed to compromise the IoT devices by leveraging the vulnerabilities like default passwords. The attackers can affect the physical state of the device once the device is compromised [6, 5]. IoT devices are vulnerable to the scenario where devices connected to a gateway can collaborate to mislead the smart decision of the IoT network. One of the scenarios is wherein IoT devices in two different LANs or locations can collaborate using a high transmission antenna to exchange data say temperature, pressure, and humidity. The collaborating IoT devices can then upload the distant location data to the server with its own location which can cripple the system due to high temperature maliciously reported as low temperature. Adversary Learning (AdLIoTLog) framework collects log files from the various application gateways [11] and apply deep learning [12, 14] to detect collaborating nodes connected through high transmission power channel for adversary behavior to other nodes shown in Figure 1.

Fig. 1: Deep Adversary Architecture

While capturing data through IoT, metadata can also be captured to apply AI techniques for IoT network security. Traditional AI techniques were about centralized data. In another AI paradigm called as federated learning (FL) model is trained from distributed systems over the cloud. Here interesting observation for FL is that the learned model over distributed systems can be secured like other encrypted numbers communicated over the Internet [9]. A simple/low-complexity resource allocation algorithm is proposed for a wireless network to support multiple FL groups [15]. IoT devices may be compromised. We propose in this paper to analyze network traffic logs of IoT devices distributed in a network behind the application gateways. This network traffic logged at application gateways can be used to identify compromised devices as well as collaborative adversaries.

The IoT devices log comprises the chronological events of the packets of these protocols. The packets include port numbers, IP addresses, sequence numbers, flags, checksum, window size and domain names. For example, domains such as,, and are frequently requested by Amazon Echo; sub-domains of and are seen in DNS queries from the HP printer [13]. Due to this complex structure of IoT log data, it is complicated to analyze logs for any information.

A similar method [8, 3] is presented at the application gateway to authenticate the IoT device by analyzing the 212 features like TCP src port and TCP dst port from the packet headers of IoT devices in the logged network traffic.

Log analysis using the Recurrent Neural Network (RNN) method has been [17, 12] studied to predict future events. In ns-2 network simulator, two network scenarios were set up to generate data for the proposed study. The trace files log the sequences of network events of the nodes comprising of different types of protocols packets. The first scenario was a network without any adversary node while the second scenario was the network with collaborating adversary nodes which were connected through a link layer tunnel as hidden channel for adversary behavior to other nodes.

The Adversary Learning model degrades by 3-4% in the presence of attack in comparison to the scenario when the network is not under attack. Model was found more robust for UDP packets in comparison to TCP and HTTP packets. A network protocol fixes the packet format in the network traffic of the devices. We observed that the Recurrent Neural Network models - LSTM, GRU etc. are learned with less execution time and better predicting for network problems in addition to language translation, emotion detection, and fake news detection problems. Our contributions are as follows:

  1. Collaborative adversary events detection was found effective using RNN.

  2. The network simulator ns-2 trace files generated for collaborative attack dataset and further interfaced in PyTorch for AI analytics using RNN model.

Ii Gated Recurrent Neural Network- Sequence-to-Sequence Model


We take the RNN-based GRU model [4]

for the problem of anomaly detection in an IoT Network. Assume that the IoT network log vocabulary can be expressed as input network event and predicted network event using sequences


respectively. LSTM and GRU create internal gates to regulate the information. These gates can learn the important data and can pass the relevant information until long chain. The core of the GRU is composed of encoder-decoder network which consists of three parts: (i) Encoder (ii) Attention Context Vector (iii) Decoder.

Encoder: An encoder is a stack of many recurrent units where each accepts an element of the input network event sequence say . The hidden states are computed with the help of current input, previous state, and weights of the network. This is the final hidden state of the encoder.

Attention: The context vector aims to encapsulate input network event sequence information to assist the prediction of output network event sequence by a decoder. Context vector acts as an initial hidden state for the decoder. The context vector is computed as in Equation where and with the help of previous hidden state , previous state , and weights of the network normalized over the source sequence.

Decoder: A decoder also comprises of recurrent units cells wherein each cell predicts an element of the output network event at a time step. Each recurrent unit cell accepts the previous target state and source context vector to produce output element and next target hidden state.

The target hidden state

is computed using the previous hidden state. The Probability distribution

over all elements of network event in target vocabulary is produced from the decoder conditioned on the previous ground truth event , the source context , and the target hidden state using Softmax.

The output of a GRU is a

-dimensional tensor

which represents the probability distribution of each network event element of

over the

classes. The GRU is trained over by defining a loss function

minimized iteratively through backpropagation by varying parameter


Iii Adversary learning- System model

An IoT device uses protocols such as UDP, TCP, HTTP, TLS, DNS, DHCP, ARP, and ICMP while to upload the data on the data server. IoT devices can collaborate by using a high range transmission antenna to exchange data. Since collaborating IoT devices are only using high range channel, therefore, they can upload the data of distant location as of their location. The network traffic comprises events of the packets exchanged between the application gateway (AG) and the server on the cloud. This network traffic can be logged onto the application gateway for AI analytics [11]. This traffic includes packets exchanged by collaborating devices also.

AdLIoTLog uses sequences of the packet events of the protocols and subsequences of the packet events logged of each protocol. We model the AdLIoTLog using data with attack and the data without attack for comparison purpose. This model can be trained without actually sharing the data [9]. Initially, a global model is aggregated thereafter local model updates and provides local model updates to an aggregator. An aggregator combines all local model updates ( with ) and construct a new global model ( with and ). Edge devices query the aggregator for any adversary in the network [16].

Iii-a Model Construction

The GRU RNN model keeps track of dependencies among elements in the sequences and therefore in the problem of predicting sequences of network events, the GRU is set to learn the set of pairs where is an input sequence of events and is the expected next sequence of network events.

Let be a set of malicious nodes represented by . Let is the set of events of malicious nodes and is the set of events of malicious nodes. The node perform sequence of events
The node perform sequence of events

Therefore it is required to learn a function that can be used for any given source malicious events of to predict the targeted coordinated malicious events of . AdIoTLog collects IoT log over the LAN therefore AdIoTLog comprises of let nodes over one application gateway say while nodes over another application gateway say . AdIoTLog computes the probability of possible events in the sequence P(() ()).

Iv Algorithm

Our method AdIoTLog aggregator combines all local model updates of with . AdIoTLog input the network events sequences of and to the encoder. If direct communication in the IoT network is allowed then also IoT nodes across the will not be able to communicate due to low transmission power channel. The malicious IoT devices communicate over the hidden link layer channel. Communications over a ”hidden” channel include data packets as well as control packets, such as ARP packets, TCP/UDP packets, and other types of packets. The trace-driven event log comprises the communication across the IoT network.

The sequence to sequence model can map sequences of varying lengths to each other. Encoder track every output and hidden state such as when a network sequence of size 100 is input with 256 hidden size, it produces encoder output tensor of size (100, 256) and final hidden state tensor of 256 size.

The decoder uses the last output or the last hidden state of the encoder. The attention helps the decoder network to compute attention weights and then these weights are multiplied by encoder output vectors to create a weighted encoded vector that contains information about the input network event sequence. We used max sentence length of 100 words to train the attention layer. The concept of using target outputs as next input is called teacher forcing  

[7] that helps to converge the training process faster. AdIoTLog used the teacher forcing algorithm randomly with a probability of 0.5.

Network loss is computed in AdIoTLog based on decoder output and target tensor. Network weights for both encoder and decoder are optimized using stochastic gradient descent (SGD) optimizer using a learning rate in range of 0.01-0.0001. We stored loss after every 100 steps to track if the network is learning.

If is the set of events with the top high probabilities from IoT nodes with same then the AdIoTLog reports nodes to be benign otherwise nodes are reported anomalous. AdIoTLog takes different IoT traffic patterns depending on the protocols used by the IoT device. When two hosts communicate with one another, it does not always indicate malicious activity; however, if those nodes are not in the range, then it indicates malicious behavior, which is modelled as nodes in two distinct AGs.

The main aim of our method is to feed the higher context, i.e. splitting the input text into contextual content to increase the model output probability distribution so that it matches with the probability distribution of the ground truth values. The TCP packet, UDP packet, and HTTP packet were considered in different contexts. This potentially can reduce the gap between training and inference by training the model to handle the situation, which will appear during test time.

Fig. 2: (a) IoT network without collaborating nodes in ns-2 (b) IoT network with collaborating nodes in ns-2

V Experiment Setup, Results, and Analysis

The GRU RNN model is available in PyTorch. For our experiments, we used Intel 4.7 GHz i7-8700K, 8 GB GTX 1080 with 2560 CUDA cores, and 64 GB Dual Channel DDR4 at 2400 MHz to run the PyTorch library for GRU model. The various hyperparameters are explained in Table


V-a Training Data

We used ns-2 network simulator dataset to verify the proposed detection of collaborating nodes. We explain the training process that includes preparing data and training the sequence to sequence network.

V-A1 The dataset prepared using Network Simulator

The training dataset in ns-2 was created with 16 nodes. Figure 2 shows two scenarios (a) IoT network without collaborating nodes and (b) IoT network with collaborating nodes. Node pairs were simulated as IoT device and data server. For example, node pair (14, 2) was simulated for node 14 to upload data to node 2. In the first case, when there is no collaboration, node 14 will upload the data to node 2. In the second case, when nodes can collaborate using hidden channel, node 14 will upload the data to node 15. Eight UDP communication node pairs were used with a 1500 byte packet at the rate of 1 Mbps to generate the data. One pair (14,15) was collaborating adversary nodes which means 12.5% of simulated traffic constitutes the attack. AI analytics Sequence-to-Sequence model can remember the good events and collaborative events, therefore, results in detecting the malicious events of the network.

V-A2 Training Sequence-to-Sequence network

To interface ns-2 trace file to RNN model, we need a tensor pair. A tensor indexes IoT network log vocabulary for the input of the model. A tensor pair was prepared by including network events input tensor and network events target tensor. The logged network events were paired following the order of timestamps of network events one after the other. The input file included 12,236 network sequence pairs with 4170 unique elements that comprise different types of packets, protocols, sequence numbers, and flags. The combined ns-2 trace files of both setups - networks with hidden channel and network without hidden channel were input to the model. To compare the two scenarios, GRU RNN was also trained without adversary node. Figure 3 shows variation of model training (Negative Log Likelihood Loss) with moving average of over 100 iterations. Table I shows hyperparameters used in the training process.

Fig. 3: Variation of model training (NLL Loss) x 100 iteration

V-B Results and Analysis

We define the following performance metric based on BLEU score [10]. The BLEU score was computed by comparing the predicted network event sequence with the ground truth network sequence using 1-gram (single words). It is 100 if predicted sequence is exactly similar to ground truth sequence. We define accuracy of the model that is based on the number of testing pairs as follows:

Let are testing pairs and their respective bleu scores are then accuracy of model output will be .

Network No of hidden layer No ofiterations Learning rate Hidden Layer size optimizer

IoT ns-2
1 70,000 0.01-0.0001 256 SGD

TABLE I: Hyper parameters of model
Network Size of set Node tuple in in data without attack (node, actual data server, pred data server) Node tuple in data with attack (node, actual data server, pred data server) Accuracy with collaborative attack Accuracy without collaborative attack

IoT ns-2
5 (6,15,15), (3,2,14), (6,15,15), (2,3,3), (8,9,9) (14,2,15), (0,12,12), (2,11,14), (0,12,12), (15,5,6) 89-95% 91-98%
TABLE II: Comparison of Model Output under collaborative-attack and non-collaborative attack scenarios

Table II shows model performance comparison of model output under collaborative-attack and non-collaborative attack scenarios. The experimental results show accuracy of 89-95% in case of collaborative attack and 91-98% in case of non-collaborative attack.

Next, we explain the findings of the experiments based on model performance. The experiments were designed to answer the following questions:

What is the performance of the GRU RNN-based models when the input network events use sequences of the packet events of the protocols such as TCP, UDP, and HTTP and subsequences of the packet events like sequence number, IP addresses, and window scale option logged for each protocol

In simulator trace files, the values of features like sequence numbers, flags values, and IP addresses were simulated values different from the format used by TCP/IP model on a network. For example, IP address in dotted format (e.g. was replaced by a node number (e.g. 14). The generated sequence numbers were much easier to keep track of relatively small, predictable numbers rather than the actual numbers. Acknowledgement numbers were also not very random. We observed the high accuracy of GRU RNN in predicting the network events for simulator data in comparison to dumped TCP/IP model output on a IoT network.

How is the subset of with top probabilities over -dimensional tensor in classes changed using the data under collaborative attack (shown in Figure 2a) and non attack scenario (shown in Figure 2b)

When the results of both models (a) with attack data and (b) without attack data were compared with each other on the set , we observed that the accuracy of predicted network subsequences was less in the presence of attack. The collaborating nodes were connected through hidden channel, therefore the communication of adversarial node in with collaborating adversarial node in superseded in comparison to other nodes in their respective networks and . The model was biased on predicting the collaborating class in the classes. In other words, in AdIoTLog under attack scenario, P(()
()) was high most of the times for collaborating class. If we present Node tuple in as (node, actual data server, pred data server) shown in Table II, for tuple (14, 2, 15), the actual node event was by 2 while model predicted node 15 rather than node 2 for source 14.

How the performance of model varied when used for scenario with collaborating malicious nodes

In simulator generated log, there were sufficient instances needed for model learning. The model performance reduced in the presence of the attack. Features were much easier to keep track because of relatively small, predictable numbers rather than the actual numbers. We observed that when there is less data to train we get more iteration however when there is more data, number of iteration are less. GRU RNN can predict the event when trained on even small data with context.

Figure 3 shows variation of model training (Negative Log Likelihood Loss) with moving average over 100 iterations.

Vi Background

Wu et al. [17] presented a method for analyzing the bigdata collected through tools e.g. Hadoop, Mapreduce, Hive and others based on seq2seq- predictive models. Rather than input the seq vector from the bigdata components, the authors labeled the data of each component, to get the embedding vector, and subsequently, input labeled vector to attention matrix. Finally, the predicted vector is obtained using target information. In contrast, in our proposed method we split input vector on the basis of context of protocol which is processed by encoder and subsequently processed by decoder incorporating attention using target vector.

The model developed by Shen et al. [12] for collaborative nodes trying to use vulnerabilities of intrusion protection system also used RNN. The negative shift in model prediction was used to detect the attack, however, the attack considered was not over distributed machines, also they did not compare output of system under attack and without attack scenarios. They varied the length of input sequence to study the performance of RNN with increasing length of input sequence. Although, we observed that RNN shows better results if delimited using context rather than by arbitrarily varying the length of sequences input to the model.

The model developed by Almiani et al. [1] for intrusion detection system in an IoT network was trained on the NSL-KDD dataset for different types of attacks. Amanullah et al. [2] studied and presented the deep learning technologies for IoT security. The growth in use of deep learning models for security shows its promising applicability on IoT logs.

Vii Conclusion

In this paper, we studied performance of GRU RNN model on the traffic log of an IoT network with and without adversary nodes. The adversary nodes are assumed to collaborate to cripple the data to be uploaded. We found that adversary nodes can be detected without considering any additional events rather it is required to log the network traffic. The log can be analyzed by AI algorithms to detect the adversary nodes in the network.


  • [1] M. Almiani, A. AbuGhazleh, A. Al-Rahayfeh, S. Atiewi, and A. Razaque (2019) Deep recurrent neural network for iot intrusion detection system. Simulation Modelling Practice and Theory, pp. 102031. Cited by: §VI.
  • [2] M. A. Amanullah, R. A. A. Habeeb, F. H. Nasaruddin, A. Gani, E. Ahmed, A. S. M. Nainar, N. M. Akim, and M. Imran (2020) Deep learning and big data technologies for iot security. Computer Communications 151, pp. 495–517. Cited by: §VI.
  • [3] S. Aneja, N. Aneja, and M. S. Islam (2018) IoT device fingerprint using deep learning. In 2018 IEEE International Conference on Internet of Things and Intelligence System (IOTAIS), pp. 174–179. Cited by: §I.
  • [4] D. Bahdanau, K. Cho, and Y. Bengio (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473. Cited by: §II.
  • [5] V. Hassija, V. Chamola, B. C. Bajpai, S. Zeadally, et al. (2020) Security issues in implantable medical devices: fact or fiction?. Sustainable Cities and Society, pp. 102552. Cited by: §I.
  • [6] J. Hou, L. Qu, and W. Shi (2019) A survey on internet of things security from data perspectives. Computer Networks 148, pp. 295–306. Cited by: §I.
  • [7] A. M. Lamb, A. G. A. P. Goyal, Y. Zhang, S. Zhang, A. C. Courville, and Y. Bengio (2016) Professor forcing: a new algorithm for training recurrent networks. In Advances In Neural Information Processing Systems, pp. 4601–4609. Cited by: §IV.
  • [8] M. Miettinen, S. Marchal, I. Hafeez, N. Asokan, A. Sadeghi, and S. Tarkoma (2017) Iot sentinel: automated device-type identification for security enforcement in iot. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 2177–2184. Cited by: §I.
  • [9] D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, J. Li, and H. V. Poor (2021) Federated learning for internet of things: a comprehensive survey. arXiv:2104.07914. Cited by: §I, §III.
  • [10] K. Papineni, S. Roukos, T. Ward, and W. Zhu (2002) BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Cited by: §V-B.
  • [11] P. Robyns, B. Bonné, P. Quax, and W. Lamotte (2017) Noncooperative 802.11 mac layer fingerprinting and tracking of mobile devices. Security and Communication Networks. Cited by: §I, §III.
  • [12] Y. Shen, E. Mariconti, P. A. Vervier, and G. Stringhini (2018) Tiresias predicting security events through deep learning. In Proceedings of 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 592–605. Cited by: §I, §I, §VI.
  • [13] A. Sivanathan, H. H. Gharakheili, F. Loi, A. Radford, C. Wijenayake, A. Vishwanath, and V. Sivaraman (2018) Classifying iot devices in smart environments using network traffic characteristics. IEEE Transactions on Mobile Computing 18 (8), pp. 1745–1759. Cited by: §I.
  • [14] C. Sweet, S. Moskal, and S. J. Yang (2019)

    On the veracity of cyber intrusion alerts synthesized by generative adversarial networks

    arXiv:1908.01219. Cited by: §I.
  • [15] T. T. Vu, H. Q. Ngo, T. L. Marzetta, and M. Matthaiou (2021) How does cell-free massive mimo support multiple federated learning groups?. arXiv:2107.09577. Cited by: §I.
  • [16] J. Wang, X. Yi, R. Guo, H. Jin, P. Xu, S. Li, X. Wang, X. Guo, C. Li, X. Xu, et al. (2021) Milvus: a purpose-built vector data management system. In Proceedings of the 2021 International Conference on Management of Data, pp. 2614–2627. Cited by: §III.
  • [17] P. Wu, Z. Lu, Q. Zhou, Z. Lei, X. Li, M. Qiu, and P. C. Hung (2019) Bigdata logs analysis based on seq2seq networks for cognitive internet of things. Future Generation Computer Systems 90, pp. 477–488. Cited by: §I, §VI.