There is a growing number of cybersecurity threats related to the extended utilisation of Cloud Computing and Edge Computing. The Brazilian Center for Studies, Response and Treatment of Security Incidents (CERT.br), which monitors attacks attempts and their types, shows the growing tendency of Cyberattack incidents such as Distributed Denial of Service (DDoS) attacks [1, 2] whose incidents grew by 125.36% between the first quarter of 2016 with the same period of 2015. Such attempts, successful or not, result in economic, reputation, and social impact. A report from PwC Consulting describes the economic impact of Cybersecurity breaches in areas like disruption of operations and manufacturing, compromise of sensitive data, negative impact to product and services, damage of physical property, and harm to human life . The scaling number of virtual crimes and the exploitation of vulnerabilities in Distributed Computing demand new forms of preventive measures to preserve security and privacy.
We are motivated by the need for effective Cybersecurity strategies for intrusion detection and fast response, aiming to prevent disruption, preserve privacy and security, and optimise operations. Cybersecurity threats are primarily linked to storage and transfer of large chunks of information, along with their importance and vulnerability . The most common security threats in Distributed Computing include hijacking, man-in-the-middle, denial of service, phishing, and others [5, 6, 7]. Buyya et al  points the lack of a well-defined security strategies in heterogeneous environments combining Cloud Computing and Edge Computing. This issues is mostly due to the characteristics of the environment involving distributed architectures, complex and heterogeneous elements, and large scale operations.
We are looking into a combination of Autonomic Computing [9, 10] and Big Data to deal with the large volume of information collected from the audits of the various system components and to provide rapid response. This work contributes to the state-of-the-art by:
Providing a reference architecture for Autonomic Intrusion Response System based on a combination between Autonomic Systems and Big Data.
Presenting a proof-of-concept implementation of a full-cycle attack-response interaction in heterogeneous Distributed Computing environments.
Analysing this approach’s performance for accuracy, efficiency, and scalability to real-world scenarios.
In what follows, we elaborate on the background, state-of-the-art, and technology gap. Section III outlines our proposal. Section IV describes the results from executing a proof-of-concept implementation upon experimentation environments of private and public clouds. We discuss the results and opportunities in Section V.
Ii Background and Related Work
System administrators demand approaches of Intrusion Detection Systems (IDS) to minimise the harms of hackers, crackers, and other cyber-criminals [11, 12, 13]. In general, preventive systems employ techniques to analyse the behaviour and origin of the attempts to then define whether the action is allowed . Response time is crucial to prevent intrusions. Cohen et al  points out that for a skillful intruder his attack will have 80% chance of success if the response time is around 10 hours, 95% chance if the intruder has 20 hours, and for over 30 hours the attack renders virtually infallible; however, if the response is immediate, then the chances of the intruder’s success are practically nil. Nonetheless, current approaches present a significant time gap between detection and response, mostly due to the need for manual intervention [16, 17, 18].
There is a cohort of research looking into how to improve IDS towards quick detection of malicious or unauthorised actions [19, 20], and intelligent management methods of Distribute Computing [6, 21]. In general, an IDS encompasses:
Detection, usually performed automatically by monitoring patterns in the systems’ log entries and behaviour of the elements.
Warning, triggered via analysis of behaviour patterns and raising awareness of potential issues to system administrators.
Decision making, provides decision making support based on data analytics systems.
Response, implementing actions upon the elements, along with evaluation of the their results.
Buyya et al  argues that existing strategies for attack detection and response still fail to provide satisfactory results for Distributed Computing environments. Notably, current implementations presents a delay between Detection and Response. Moreover, current developments tend to focus on Detection and Warning, whereas there is a critical demand to optimise the manual intervention in Decision Making and Response .
Hence, there is a technology gap between strategies for detecting attack attempts and existing response mechanisms. Although fast response is a clear demand, current implementations mostly depend on manual interventions rending these solutions slow and ineffective. In this context, the motivation for this work encompasses:
define the requirements for an effective autonomic attack response;
outline the required decision-making algorithms to support this systems;
field test proposed approaches in controlled environments in order to evaluate performance, applicability to real-world scenarios, and scaleability to large Distribute Computing systems.
The Autonomic Intrusion Response System (SARI) follows the vision of autonomic computing around self-healing, self-protection and self-optimising. The solution works based on the Monitor-Analyse-Plan-Execute-Knowledge (MAP-K) architecture to efficiently analyse large amounts of data about the utilisation of Distribute Computing resources.
Figure 1 depicts the system architecture. We devised an approach to collect system log datasets about network traffic, system information, and sensors, following the proposal by Suthaharan et al . The solution pre-processes these datasets to consolidate information and remove noise. The cycles for analysis and planning implement the MapReduce strategy to correlate the information – this is a programming model designed to process large volumes of data in parallel, dividing the work into a set of independent tasks [23, 24]. Our architecture encompasses the following components:
Monitoring Module implements probes to collect information about behaviour changes of the managed elements, and other execution information; these datasets include e.g. system log and other monitoring systems installed in the Virtual Machines such as Snort, OSSEC, Hypervisor, network traffic, system settings, and SMNP data [11, 25]; sensors are designed to collect data from Hypervisors and VM instances through the library jNetPcap.
Analysis Module implements the processes for categorisation (or mapping) and reduction; in this module, MapReduce
is applied to (i) identify the signatures of known attacks and (ii) extract significant data such as the origin of the attack, features of the data packages, and others. This process analyses and classifies data packages in relation to their protocol. Then, the process applies different algorithms to specific protocols that implement the process to reduce data volumes. The solution results in a data hierarchy for analysis and a compilation of possible issues causing the attacks.
Planning Module implements the MAPE-K loop strategy based on the theory of expected utility. The planning component collects data from the Analysis module to characterise the current situation. Then, the planning process applies algorithms to select the response action that is most likely to work in the determined scenario. The process applies the theory of expected utility is applied to select the best response. The to the attack. The method works by analysing diverse alternatives applicable to the situation possible In this technique, the various possible alternatives are analysed and the one that brings the highest response value to a given environment configuration is selected .
Execution Module performs the notification or response action on detected intrusions depending on the configuration.
holds the information requires for system’s operations, such as: collected data; known signatures; time values; cost and probability of each response; applied response techniques; environment settings, and more.
The solution employs a knowledge based approach to detect known attacks by comparing attack signatures to suspicious actions [27, 28]. Figure 2 depicts the operation. The strategy applies MapReduce to allow working on large datasets through parallel execution on a cluster of machines. Files are split into smaller pieces to be distributed through the cluster during the partition step. Each fragment is distributed to an instance that will execute the map algorithm (see Algorithm 1) and the reduce algorithm (see Algorithm 2), sequentially. The output is put forward to the Planning Module for rule-based analysis on the consolidated information.
Iii-a Response Strategy
The response strategy follows the concept of expected utility principle . Decisions are made based on the probability of positive intrusion events versus uncertainty about response effectiveness . For the decision process, the model takes in consideration environmental elements, such as: parameters of the cloud environment; target virtual machines; parameters of the attack set, and; response parameters such as cost of actions, effect time, success probability, effectiveness history, and others. The system model includes the following parameters:
: environmental parameters;
: parameters of the attack set;
: parameters of the response set.
: result of the actions.
The expected utility formula is given as follows:
where: is the result set; is the probability of the result conditioned to the action ; and is the utility of the result (response) . Translating to the construction of the AIRS system, we have the components defined as:
: set of attacks.
: set of actions that are responses to attacks.
: result set depending on whether or not they work.
: set of costs of executing (processing) actions-responses to attacks.
: set of time durations for action-response execution.
: set of probabilities of a result to be action-response success.
Other definitions involve costs, elapsed times and probabilities, as follows:
Given these definitions, the expected utility of a response-action considering a result is given by:
Where is the number of actions defined in the system; is an action-response; is the effectiveness probability of , and is the expected utility of . Normalisation is implemented by shifting the values of each resource so that the minimum value is and then dividing by the new maximum value, which is the difference between the original maximum and minimum values. The system applies the following method to normalise the utility calculation:
Let be the set of all possible actions in a processing environment, such that an element is an action that can be performed, and; represents the set of possible responses. Then, a response is defined as a possibly effective action where: is the result set; is the conditional probability of the result given the action ; and is the utility of .
In sum, the method applies cost , time and probability , where is the possible number of attacks. The utilities is calculated as:
Where corresponds to an action-response; corresponds to a determined attack as per the examples in Table I; is the result set; is the conditional probability of the result given the action ; and is the utility of .
In the proposed model, the largest sum corresponds to the most useful action-response, which means that we inferred a preferable action-response . That is, is more likely to be effective than its peers.
Iii-B Application Example
Table II presents an application example of the expected utility method considering: a knowledge base
with previously estimatedutility costs , elapsed time , and success probabilities , for the attacks .
Let us assume that attacks and are detected. The first step is to normalise the values of and . Table III presents the normalised values. Next, we apply the utility formula for each set , resulting in the values depicted in Table IV.
Next, you must make the 2 sum of the utilities that correspond to the actions for the attacks, and choose the highest value utility.
In the example above, the action has the best utility and will be selected within the proper context. Moreover, it is possible to adjust the values based on observation of effectiveness of actions in given contexts. One important factor to observe is the elapsed time to implement the actions aiming at effective and efficient delivery.
We developed a proof-of-concept implementation to evaluate the approach and executed it in two scenarios: (i) VMs running on a private cloud in our Lab, and; (ii) VM running on Amazon public cloud. For both cases, we generated two sets of data representing (a) legitimate access and (b) security attacks. The implementation utilises Java 8 and the JnetPCap library. Dedicated attack nodes and legitimate nodes were used to perform the tests. The experiment generated considerable amount of data and demanded extensive processing time.
The script to generate attacks was implemented upon Scapy , which allows to generate network traffic and inject attacks. The script dynamically mounts a TCP packet informing data, such as source port, destination port, source ip, destination IP, payload, and the ack package. The configuration included the target machine and a payload parameter of QLInject with large data volumes, forming a typical DDoS attack, presented in Code 1.
Sensors were installed at specific capturing points to test the Monitoring Module. Then, the environment was configured to choose the network interface to be monitored. The captured data was stored in files containing packages. Two criteria were established: elapsed time and data volume. If one of the criteria was triggered, the monitoring files would be sent to the analysis module. The SARI process entails:
collecting the log file from the VMs;
transferring the files to the detection server, and;
executing the detection algorithm.
|Size in MB||Number of Packets||Number of Attacks|
Table VI presents variations of datasets generated through multiple experimentation configurations.
The Analysis and Planning module is implemented in Java using the Hadoop library to support MapReduce. For each experiment a cluster was created to process the analysis. This module receives the datasets for processing and generates results like the one depicted in Table VII. Table VIII presents the parameters for the utility calculation.
|Source IP||IP Destination||Attack||Quantity|
|Actions||Probability||Normalised Cost||Normalised Elapsed Time|
Iv-a Experimenting on a Private Cloud
Figure 3 depicts the testing environment implemented over a private cloud computing in our laboratory. The environment is composed of a CloudStack hypervisor and Xen orchestration system running on Debian. We created: (a) a set of VMs representing the invaders; (b) a set of VMs with WEB servers and databases representing the target, and; (c) a cluster of 3 computers to execute SARI processes.
Figure 4 shows the behaviour of the analysis module against the different configurations. Figure 5 depicts the normalised ratios in the testing environment experiments. There are two values close to the maximum utility (0.637 and 0.650), since they represent actions where the probability is the same (95 %). The execution the cost presents small variation (0.443 and 0.451). However, the execution time presents ample variation (9.118 and 42.011). We conclude that the thread with the smallest processing time has the greatest utility (0.650).
Iv-B Experimenting on a Public Cloud
The public cloud experimentation setup is similar to the private cloud one. The implementation was on the Amazon Web Services (AWS) platform, simulating legitimate and intrusive users against 4 target machines as service providers.The SARI system is represented by an interface for monitoring and analysis, along with a cluster for planning and execution and the knowledge base.
Figure 6 presents the results from the execution on the public cloud. Figure 7 presents the utility function for the cost of actions. The most useful action has value and was process in 950 seconds at a cost of 1900 units.
The key difference between the public and private cloud experiments is the time lag between the steps, which is shorter in the public cloud environment due to the larger availability of computational resources.
Hence, we concluded for significant improvement in response effectiveness and potential to scale to large environments.
We presented a reference architecture for Automated Intrusion Detection based on methods of Big Data for the classification, understanding and prediction of behavioural deviance in Distributed Computing environments. The proposed solution covers for the technology gap attack detection strategies that provide satisfactory results in Distributed Computing environments.
The Autonomic Intrusion Response System follows the vision of autonomic computing and works based on the Monitor-Analyse-Plan-Execute-Knowledge (MAP-K) architecture to efficiently analyse large amounts of data about the utilisation of Distribute Computing resources. The solution employs a knowledge based approach to detect known attacks by comparing attack signatures to suspicious actions The strategy applies MapReduce to allow working on large datasets through parallel execution on a cluster of machines.
We evaluated the proposed approach through a prototype implementation against two scenarios: (i) VMs running on a private cloud in our Lab, and; (ii) VM running on Amazon public cloud. The results demonstrate the effectiveness of the solution allowing to process large volumes of access information in acceptable time delays for both the private cloud and the public cloud experiment. We demonstrated that the approach is able to handle real-world scenarios and deliver low latency response results, aligned with the requirements for Automated Intrusion Detection. Hence, we concluded for significant improvement in response effectiveness and potential to scale to large environments.
We argue that a product-grade implementation based on the proposed reference architecture would effectively reduce the damage caused by diverse forms of Cyber attacks on Distributed Computing environments.
As a limitation, the proposed approach does not contemplate optimisation upon the algorithmic complexity of the expected utility theory. That is, given an attack, the algorithm needs to calculate the sum of the utility of each response. This calculation grows exponentially given the number of responses implemented in the model. Another limitation is the application of rules for knowledge-based detection methods using known attacks within the scope of this work. We consider these limitations acceptable, as the purpose of the project was to demonstrate the feasibility of using Big Data strategies and provide a reference architecture for the implementation. Further work will have to extend on this discussion to attain product-grade implementations.
Further work may also involve research in the application of Machine Learning and Cognitive Computing to detect attacks beyond the scope of the implemented rules. Newattack signatures could be discovered and incorporated to the architecture’s knowledge base thus continuously improving the system’s effectiveness overtime. This strategy would progress the solution towards a self-learning and self-adjustable system, laying the ground for a future Cognitive Intrusion Detection Systems.
This work was conducted by Dr. Kleber Vieira in the scope of his doctorate program in Computer Sciences at the Network and Management Laboratory (LRG), Department of Informatics and Statistics, Federal University of Santa Catarina (UFSC), Brazil. The research was supervised by Prof. Dr. Carlos Becker Westphall and counted with valuable input from LRG’s colleagues. Special thanks to Prof. Dr. Joao Bosco Sobral and Prof. Dr. Jorge Lopes de Souza Leao for their input and contribution. Dr. Fernando Koch contributed with putting this paper together and provided extensive input during the elaboration of the research. Dr. Koch is a Visiting Researcher at LRG/UFSC, Honorary Senior Fellow with The University of Melbourne, Australia, and supported by the Brazilian CNPq Productivity in Technology and Innovation Grant (CNPq 307275/2015-9).
-  F. Lau, S. H. Rubin, M. H. Smith, and L. Trajkovic, “Distributed denial of service attacks,” in Systems, Man, and Cybernetics, 2000 IEEE International Conference on, vol. 3, pp. 2275–2280, IEEE, 2000.
-  R. K. Chang, “Defending against flooding-based distributed denial-of-service attacks: a tutorial,” IEEE communications magazine, vol. 40, no. 10, pp. 42–51, 2002.
-  C. Castelli, B. Gabriel, J. Yates, and P. Booth, “Strengthening digital society against cyber shocks,” tech. rep., PwC Consulting, 2018.
-  A. D. Smith and W. T. Rupp, “Issues in cybersecurity; understanding the potential risks associated with hackers/crackers,” Information Management & Computer Security, vol. 10, no. 4, pp. 178–183, 2002.
-  C. P. Pfleeger and S. L. Pfleeger, Security in computing. Prentice Hall Professional Technical Reference, 2002.
-  S. Subashini and V. Kavitha, “A survey on security issues in service delivery models of cloud computing,” Journal of network and computer applications, vol. 34, no. 1, pp. 1–11, 2011.
-  A. Behl, “Emerging security challenges in cloud computing: An insight to cloud security challenges and their mitigation,” in Information and communication technologies (WICT), 2011 world congress on, pp. 217–222, IEEE, 2011.
-  R. Buyya, R. Calheiros, and X. Li, “Autonomic Cloud computing: Open challenges and architectural elements,” Emerging Applications of …, pp. 3–10, 2012.
-  J. O. Kephart and D. M. Chess, “The vision of autonomic computing,” Computer, no. 1, pp. 41–50, 2003.
-  P. Horn, “Autonomic computing: Ibm’s perspective on the state of information technology,” 2001.
-  C. Modi, D. Patel, B. Borisaniya, H. Patel, A. Patel, and M. Rajarajan, “A survey of intrusion detection techniques in cloud,” Journal of Network and Computer Applications, vol. 36, no. 1, pp. 42–57, 2013.
-  A. Schulter, F. Navarro, F. Koch, and C. B. Westphall, “Towards grid-based intrusion detection,” in Network Operations and Management Symposium, 2006. NOMS 2006. 10th IEEE/IFIP, pp. 1–4, IEEE, 2006.
-  L. Dali, K. Abouelmehdi, A. Bentajer, H. Elsayed, E. Abdelmajid, and B. Abderahim, “A survey of Intrusion Detection System,” in Web Applications and Networking (WSWAN), 2015 2nd World Symposium on, pp. 1–6, IEEE, 2015.
-  N. Stakhanova, S. Basu, and J. Wong, “A taxonomy of intrusion response systems,” International Journal of Information and Computer Security, vol. 1, no. 1, pp. 169–184, 2007.
-  F. Cohen, “Simulating cyber attacks, defences, and consequences,” Computers & Security, vol. 18, no. 6, pp. 479–518, 1999.
-  S. Northcutt and J. Novak, Network intrusion detection. Sams Publishing, 2002.
-  C. A. Carver, “Intrusion response systems: A survey,” Department of Computer Science, Texas A&M University, College Station, TX, pp. 77843–3112, 2000.
-  H. A. Kholidy, A. Erradi, S. Abdelwahed, and F. Baiardi, “A risk mitigation approach for autonomous cloud intrusion response system,” Computing, pp. 1–25, 2016.
-  H. Debar, M. Dacier, and A. Wespi, “A revised taxonomy for intrusion-detection systems,” in Annales des télécommunications, vol. 55, pp. 361–378, Springer, 2000.
-  U. Kumar and B. N. Gohil, “A survey on intrusion detection systems for cloud computing environment,” International Journal of Computer Applications, vol. 109, no. 1, 2015.
-  M. D. Assuncao, F. L. Koch, and C. B. Westphall, “Grids of agents for computer and telecommunication network management,” Concurrency and Computation: Practice and Experience, vol. 16, no. 5, pp. 413–424, 2004.
-  S. Suthaharan, “Big data classification: Problems and challenges in network intrusion prediction with machine learning,” in Big Data Analytics workshop, in conjunction with ACM Sigmetrics, 2013.
-  J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
-  S.-H. Ahn, N.-U. Kim, and T.-M. Chung, “Big data analysis system concept for detecting unknown attacks,” in 16th International Conference on Advanced Communication Technology, pp. 269–272, IEEE, 2014.
-  J. Werner, C. M. Westphall, and C. B. Westphall, “Cloud identity management: A survey on privacy strategies,” Computer Networks, vol. 122, pp. 29–42, 2017.
-  R. F. Bordley and S. M. Pollock, “A decision-analytic approach to reliability-based design optimization,” Operations research, vol. 57, no. 5, pp. 1262–1270, 2009.
-  K. Vieira, A. Schulter, C. Westphall, and C. M. Westphall, “Intrusion detection for grid and cloud computing,” It Professional, vol. 12, no. 4, pp. 38–43, 2010.
-  K. M. Vieira, D. S. M. F. Pascal, C. B. Westphall, J. B. M. Sobral, and J. Werner, “Providing response to security incidents in the cloud computing with autonomic systems and big data,” in The Eleventh Advanced International Conference on Telecommunications (AICT 2015)., 2015.
-  R. Briggs, “Normative theories of rational choice: Expected utility,” in The Stanford Encyclopedia of Philosophy (E. N. Zalta, ed.), Metaphysics Research Lab, Stanford University, 2017.
-  P. BIONDI, “Packet generation and network based attacks with scapy,” CanSecWest/core05, 2005.