DeepAI
Log In Sign Up

A Survey on Threat Situation Awareness Systems: Framework, Techniques, and Insights

10/29/2021
by   Hooman Alavizadeh, et al.
The University of Queensland
0

Cyberspace is full of uncertainty in terms of advanced and sophisticated cyber threats which are equipped with novel approaches to learn the system and propagate themselves, such as AI-powered threats. To debilitate these types of threats, a modern and intelligent Cyber Situation Awareness (SA) system need to be developed which has the ability of monitoring and capturing various types of threats, analyzing and devising a plan to avoid further attacks. This paper provides a comprehensive study on the current state-of-the-art in the cyber SA to discuss the following aspects of SA: key design principles, framework, classifications, data collection, and analysis of the techniques, and evaluation methods. Lastly, we highlight misconceptions, insights and limitations of this study and suggest some future work directions to address the limitations.

READ FULL TEXT VIEW PDF
02/12/2015

Applications of Artificial Intelligence Techniques to Combating Cyber Crimes: A Review

With the advances in information technology (IT) criminals are using cyb...
11/04/2019

A Comprehensive Study on Pedestrians' Evacuation

Human beings face threats because of unexpected happenings, which can be...
03/04/2021

Analyzing the Usefulness of the DARPA OpTC Dataset in Cyber Threat Detection Research

Maintaining security and privacy in real-world enterprise networks is be...
03/13/2021

Defining, Evaluating, Preparing for and Responding to a Cyber Pearl Harbor

Despite not having a clear meaning, public perception and awareness make...
09/03/2021

Leveraging Open Threat Exchange (OTX) to Understand Spatio-Temporal Trends of Cyber Threats: Covid-19 Case Study

Understanding the properties exhibited by Spatial-temporal evolution of ...
10/12/2021

Sanctuary lost: a cyber-physical warfare in space

Over the last decades, space has grown from a purely scientific struggle...
09/12/2019

Toward Proactive, Adaptive Defense: A Survey on Moving Target Defense

Reactive defense mechanisms, such as intrusion detection systems, have m...

I Introduction

I-a Motivation

Situational Awareness (SA) was firstly introduced in a comprehensive work by Endsley in [41]. Based on Endsley definition, an SA system consists of three main component which are Perception, Comprehension, and Projection. This work is considered as the main reference model for SA research and has been widely expanded and applied in various range of research contexts. For instance, the application of SA on aircraft was presented in [42] which studied the SA in aircraft pilots aiming to increase the likelihood of finding the optimal decisions in a complex real-time situations. When it comes to the Cyber security domain, Cyber SA may be defined as the preparation, incorporation, processing, and evaluation of data related to a given system to understand the system’s environment to be able to predict and respond accurately to potential Cyber threats against the given system or network [17, 7]. Situation Awareness in cyber space consists of three seminal aspects [17]: () Situation Recognition (also called as situation perception) deals with identifying the occurrence of an attack alongside with the type, source, and target of the attack. This aspect involves awareness of the collected data and information in terms of quality, truthfulness, completeness, and freshness. () Situation Comprehension including attack impact assessment (damage assessment) for both current and future impacts. This aspect also involves awareness of attacker’s behavior which considers attack’s trend and intent. Situation Comprehension aspect needs to know about the cause of current situation. () Situation Projection including awareness of how the situation evolve and may have further affects. A comprehensive design of SA system would help the decision makers to be aware of the current situation and the security posture of the system and increase their understanding of the situation up to the decision point. The planning and execution (of the response actions) occur once the decision is made based on the current situation.

However, most of the current approaches in the literature to gain cyber SA focus on the lower and abstract levels of SA techniques such as vulnerability analysis which may use attack graphs (AGs), alert correlation and intrusion detection techniques [86], analyzing attack trend [106], information flow and taint analysis [94], causality analysis and forensics, damage assessment [103]. However, higher SA level ranging from SA perception to projection are still missing and performed manually by experts which is time consuming and error-prone. There still a lack on designing a SA systems which be able to react to a dynamic environment by the ability to adapt itself without the high and intensive interaction with human or agents.

I-B Major Commercial Perspective

Gartner111https://www.gartner.com/en/documents/3945589

raised a concern over the sheer number of alerts generated by currently available market level threat monitoring systems (such as IDPS). To reduce the effort required by security analysts having to deal with isolated alerts, the use of artificial intelligence techniques to group various alerts together to create a single incident or to describe a chain of related activities has been emerged. Gartner also suggest a requirement for deploying additional threat monitoring sensors inside the network to detect threats that have bypassed traditional controls (e.g., firewalls). Further Gartner reports the move of many IDPS vendors deploying their services into public cloud (i.e., IaaS) environment than organization network firewall solutions making the cloud-based monitoring capability even more critical. The shift is caused by two reasons; (1) more organizations tend to move their high-value data and services to the cloud, and (2) take advantage of extra layer of protection provided by cloud vendors.

The report by McAfee222https://www.mcafee.com/blogs/other-blogs/mcafee-labs/mcafee-labs-2020-threats-predictions-report/

stats the potential raise of less-skilled adversaries to have more access and broader capabilities to create and weaponize deepfake (e.g., use of deep learning with adversarial effect) content. Similarly, it predicts that adversaries will use artificial intelligence to produce extremely realistic text, images, and videos capable of bypassing many biometric-based user authentication mechanisms. The report also illustrates the concern over the vulnerabilities in Application Programming Interfaces (API) which expose to public to allow the access into organization software platforms and app ecosystems. The report shows that attackers tend to shift their attack path from web app to the API as a new attack entry point.

Symantec333https://docs.broadcom.com/doc/istr-24-2019-en share the latest insight into global threat activity, cyber attacker trends, and attack motivations. The annual report warns the increasing attack through formjacking where attackers load malicious code onto retailer’s websites to steal shoppers’ credit card details, reporting close to 5,000 unique websites were compromised on average every month including Ticketmaster and British Airways. Multi-faceted attacks that combines multiple attack techniques, such as combining link as a smoke screen, cryptojacking, phishing, into a single attack is on the rise to avoid detection thus a detection technique designed to stop a single type of attack is increasingly insufficient. More targeted attack (e.g., spear phishing) has also increased to infiltrate organizations while using intelligence to gather as much information about the target organization.

I-C Key Design Principles

The fundamental design principle for developing SA systems lies in the understanding of multiple facets of the cyber landscape through the following key concerns:

  • 1– What is happening: ‘What is happening’ in SA refers to detecting whether is any ongoing attack in the system’s environment, or what resources have been compromised? This also includes the impact of an attack. However, this is a part of perception in SA, and mostly involves automated data gathering tools and pre-process of huge amount of gathered data. The quantity and quality of gathered data such as Intrusion Detection System (IDS) and Firewall logs, vulnerability scanning tools, anti-malware log files, etc. determine how effective the SA can answer this question.

  • 2– Why is it happening: ‘Why is it happening’ in SA refers to monitoring the system’s environment using the vulnerabilities, security holes, security alerts, to be aware of the potential threats and attacks. Moreover, this item refers to the way in which the situation is evolving, including attack tracking, attack behaviors and strategies analysis. In this stage, more reasoning and analysis techniques need to be heavily used.

  • 3– What may happen in future

    : ‘What may happen in future’ indicates the ability of forecasting possible futures, along with the probabilities and anticipate damage potential. It includes current situation knowledge and the possibility of its evolution alongside with knowledge about the behaviors of the adversaries. This question is a part of SA projection which answers the questions such as, what situations may be possible based on current system components, security posture and threats. Moreover, what possible ways are for further evolving and exploiting current situations be exploited?

However, the main focus of this paper is to answer these seminal questions based on the current state-of-the-art which are related to SA monitoring. Answering these questions can further be used by reaction and response phases which mainly are planning, response, and prediction. The question of whether an appropriate reaction or response can be satisfactorily done is greatly dependent upon the SA capability to deal with concerns 1–3.

I-D Review of Existing SA Survey Papers

Some efforts have been made to understand the state-of-the-art SA systems in cyber security realm. In here, we compare our survey paper with the existing SA survey papers focusing on comparing the key contributions, design, and application.

We compared our survey paper and the existing survey papers in terms of their contributions, key design, and classification together with the following principal questions:

  • Q1: What are the main threats and attack behaviors in a cyber space?

  • Q2: What AI techniques are more commonly explored by AI-powered attacks that need to be captured in Cyber situation awareness design.

  • Q3: How a comprehensive data collection need to be performed to be able to capture most threats?

  • Q4

    : What are the data types and how they can be classified to be used for situation monitoring context?

  • Q5: What theoretical and empirical techniques have been used in literature to design and develop situation awareness systems?

  • Q6: How to map the situation awareness principles including situation perception, comprehension, and projection to a related situation from low level of understanding to high perception level in a framework such as data gathering, analysis, and gaining high awareness?

  • Q7: What are commonly used tools and prototypes that can be used in analysis of situation awareness?

  • Q8: What are the specific limitations in situation awareness system in each level?

Franke and Brynielsson [49] studied the systematic review of the scientific literature on cyber situational awareness. They reviewed and clustered 102 articles in SA context. Although their categorization and mapping of the reviewed studies based on their area of focus are well-studied, they did not extensively discussed the main techniques and methods on each article in terms of design, analyze, and development of situation awareness system.

Leau and Manickam [96]

surveyed the network security situation forecasting techniques. They categorized the techniques into three categories: machine learning, Markov models, and Grey theory. The authors explained each technique in details under these categories, which is useful as a reader can understand the fundamentals of each technique. However, they did not provide the enough discussion on more possible techniques for different attack behaviours. Moreover, their research can not enable the readers to find out the answers to the principal questions Q4–Q8.

Husák et al. [70]

published a survey paper on situation awareness focusing on prediction, and forecasting methods used in cyber security. They expensively reviewed and categorized the methods and techniques based on (i) discrete models (such as attack graphs, and Markov models, and Bayesian networks), and (ii) continuous models (such as time series and grey models), and (iii) machine learning and data mining approaches. However, they mainly discussed situation projection level and the basic and critical concepts of situation awareness such data collection and pre-processing were missing in their paper. Although this classification covers a large portion of literature, it misses other perspectives of SA such as implementations and tools. However, from the reader’s perspective, it is challenging to understand the principal questions Q1–Q3 and Q7.

I-E Key Contributions & Scope

The main contributions of this survey paper have been highlighted as follows:

  • We extensively surveyed the Situational Awareness (SA) frameworks and classified them based on three main parts: data gathering, techniques and analysis, situation awareness and visualization.

  • We surveyed the most commonly used approaches, techniques, and methodologies used in the existing literature to develop SA systems, which embrace the theoretical backgrounds of anomaly analysis, Artificial Intelligence (AI), Game Theory, Machine Learning (ML), and so forth, and highlight the limitations.

  • We discussed the various application prototypes and tools which have been used or applied to SA techniques and systems.

  • We discussed misconceptions, insights, and limitations obtained from this extensive survey.

I-F Paper Structure

This paper is organized as follows:

  • Section II explains the attack and threat behaviors including advanced and AI-based threats discussed by existing SA studies.

  • Section III surveys the main data gathering approaches for SA systems. This section also includes explanation of data types and data sources used for SA monitoring. Further, we classify those types of data based on the different criteria such as availability, accessibility, complexity of use, and usability for SA system.

  • Section IV provides a comprehensive review and classification of the existing techniques and approaches used to analyze cyber SA in various systems and contexts, along with the discussions of main limitations of each technique.

  • Section V discusses the main situation awareness phases including threat evaluation, decision making, and planning alongside with visualization leading to high-level understanding of the system situation in the projection level.

  • Section VI discusses the insights and lessons learned from our study and suggests future research directions.

Ii Main Threats and Attack Behaviours

As ICT continues to evolve, so does threats and attacks incidences. In this section, we survey the existing literature on threats and attacks considered in SA systems. In addition, we survey the state-of-the-art techniques on AI-based cyber-attacks, and the techniques of monitoring such attacks. Specifically, we use the MITRE ATT&CK model that groups attacks based on adversary tactics and techniques to describe the characteristics of the attacks. In addition, we use Microsoft’s STRIDE threat model

[33] to map the attacks with their corresponding threats. The STRIDE threat model captures the unique characteristics of attacks that pose a particular type of threat.

Ii-a Threat and Attack Landscape

The STRIDE threats model categorized threats into six categories “Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of privilege”. Spoofing specifies when an adversary disguises by falsifying data or information to gain an illegitimate advantage. Tampering describes when an adversary modifies components to caused operations disruption. Repudiation is a threat that identifies when an adversary rejects actions because the actions cannot be properly tracked. Information disclosure is a threat that specifies a leak confidential information to the people who are not supposed to see it. The next STRIDE threat is Denial of Service, and it specifies when valid users are denied resources as a result of the adversary by means of exploiting the system’s vulnerabilities (e.g., memory, bandwidth, etc). The last STRIDE threat is the Elevation of Privilege and it specifies when an adversary gains unauthorized privileges by exploiting the system’s weaknesses.

The ATT&CK group attacks into the following tactics: Initial Access, Execution, Persistence, Privilege Escalation, Defense Evasion, Credential Access, Discovery, Lateral Movement, Collection, Command and Control, Exfiltration, and Impact. We describe the attack categories as follows, then we provide a summary of the attack techniques used in SA systems in Table I (grouped according to attack tactics).

  • Initial Access

    : An adversary can use different vectors to gain entry and foothold into a system. The initial access is the attack technique that allows the adversary to have initial entry to a system. This type of early-stage attack has been considered in many SA research. Brancik and Ghinita 

    [23] considered an insider attack where malicious software is planted on the organization’s computer to allow access to other machines remotely. The malware is expected to affect certain files before the adversary stage an attack. As a result of the initial access, other forms of attack tactics such as Execution, Exfiltration, etc may happen.

  • Execution: Execution consists of injecting adversary-controlled code into a program, remotely or locally. Execution is often linked to other attack tactics. For example, the work in [39] presented modeling and detection of attacks for the SA system where an adversary is able to execute a script or command with a service control manager as part of the lateral movement.

  • Persistence: An adversary access can be lost as a result of changes to the system (e.g., password changed, system restart, etc). The persistence attack consists of the techniques used by the adversary to continually maintain a foothold on the system. One example of the persistence techniques is presented by He et al. [60] for a cyber SA system in the IoT network. In the work, He et al. considered an adversary who is able to expand persistence by using malware. Specifically, the attacker is able to turn on persistence root privilege of a TV in a smart home via an Android tablet portal.

  • Privilege Escalation: An adversary can exploit system vulnerabilities to gain permissions to a user account, or even gain higher privileges than the normal user account. The adversary can then utilize the newly gain account privileges to potentially perform damages on the system. In privilege escalation, an adversary may steal normal user account credentials to have initial access before escalating the user privilege to root privilege to gain full control of the system or perform a lateral movement in network [174, 167, 91, 60].

    Attack Tactics (ATT&CK) THREATS
    Spoofing Tampering Information Disclosure Denial of Service Elevation of Privileges
    Initial Access
    trusted relationship [23]
    valid account[147]
    Fake event [105]
    spear-phishing [53]
    Execution exploit execution [53] code execution [125]
    Privilege Escalation unauthorized write [174] Hijack/DoS [174]
    Elevation of privileges [167]
    Root access[91]
    User privileges[174]
    Root privileges [174]
    Defense Evasion Backdoor [147]
    Authentication
    bypass [91]
    unauthorized access [174]
    Persistence Extended privileges[60]
    Credential Access Phishing [153] Defacement [147]
    credential leak[152]
    Password brute force [147]
    Discovery
    Probing & Mscan [162]
    Abnormal scan [52]
    IP sweep [91]
    Connections Discovery [174]
    Reconnaissance [72] System Service[162]
    Lateral Movement
    Internal spearphishing [167]
    Internal spearphishing [167]
    sequential attack [39]
    Multi-step attacks [60]
    Collection
    Information collection [72]
    information leakage [91]
    Command and Control
    run command
    and control for DoS [55]
    have full control [125]
    Exfiltration move and install false data[105] data exfiltration [52]
    Impact Modify files[60]
    DoS [152, 167],
    Smurf & Mailbomb [162]
    DDoS [55]
    TABLE I: A review of different attack techniques and threats used in existing SA systems
  • Defense Evasion: are the techniques used by an adversary to avoid detection during attacks. As a result of this technique, an adversary can get full trust in the targeted system. For instance, an adversary may take advantage of vulnerable components in a web application to bypass security rules to access a database. In addition, it may also give the adversary the ability to remotely run system commands and install malware. Once installed, detection is difficult as data will be highly obfuscated. Other example includes disabling security tool, deleting registry, etc [147, 91, 174].

  • Credential Access: Credential access are the techniques used by an adversary to steal account and passwords credentials to achieve their objectives. Various attack techniques such as phishing, man-in-the-middle, brute force, network sniffing, etc may be employed to leak or steal credentials from a company. This type of attack technique used in SA systems has been studies in [152, 147].

  • Discovery: An adversary may perform a pre-attack passive information gathering before getting into a system. By doing so, the adversary can gain knowledge of the target system, including entry points and how to achieve his objectives. In Table I, we show examples of discovery techniques used in the existing SA systems. For example, Wu et al. [162] proposed a SA mechanism based on the analysis of big data in the smart grid. In the system, they considered IP sweep and Mscan attack techniques to gain knowledge of the active devices.

  • Lateral Movement: An adversary can exploit a sequence of systems and accounts to reach their objective. For example, in a three-tier system, an adversary will have to perform a multi-stage attack across multiple systems and account to compromise a target in the last hierarchy. For instance, the work in [39, 60] has considered attack scenarios for SA systems where the adversary combines different steps sequentially to launch an attack on a specific target, where the outcome of one step serves as an input to its subsequent steps.

  • Collection: An adversary may use various-compromise information gathering techniques before stealing data on the target system, such techniques include automated internal data collection, user email data collection, etc. Ioannou et al. [72] showed an example adversary collection technique based on sensitive data from a database residing on a system prior to exfiltration. Specifically, the adversary collects and uploads data to an adversary’s server during exfiltration via specially crafted malware.

  • Command and Control: Adversary may establish command and control to plan, direct normal traffic to control their target. for instance, an adversary may communicate with a commonly used open port as a means of relaying commands and controlling compromised systems.

  • Exfiltration: involves the techniques that an adversary employs to move or copy authorized data from a system. Lu & Feng [105] developed a cyber SA framework for an industrial control system for an attack, where the entire system may have its integrity compromised by having unauthorized commands injected into systems to create fake events then move data to the adversary.

  • Impact: An attacker can impact system availability and integrity by manipulating or destroying its operations using various techniques. Here, the adversary can change the normal data route to a database in order to provide cover for a confidentiality breach. The techniques used can also include the tampering of data [60].

Ii-B Advanced Attacks

Adaptive (advanced) attacks are known as intelligent attacks. In these types of attacks, the attackers are adaptive to dynamically changing system conditions and external environmental conditions, as they take into consideration both physical and cyber accessibility. These attackers also have intelligence with regard to their resources, executing adaptive attacks [145] that wisely manage their resource limits and at the same time opportunistically seek to compromise an entire system. Kaloudi and Li [81] investigated the AI-powered cyber attacks and mapped them onto a proposed framework with new threats including the classification of several aspects of malicious using AI during the cyber-attack life cycle. In the following, we discuss the advanced cyber-attacks from existing literature. AI offers significant benefits in terms of innovation and automation in different domains. However, cyber-criminals utilizes the AI technologies to improve their attack strategies in conjunction with other conventional attack techniques discussed in Section II-A. It is important for SA systems to take into account AI-powered attacks as well as the methods to mitigate them. However, there are only a few works that attempt to develop SA systems taking into account AI attacks. Jiang [77]

developed an approach to improve SA representation and learning using collective AI (path-based embedding and graph neural network) over knowledge graphs. Specifically, the author introduced four ideas for prediction with collective AI for SA; prediction ensemble, data aggregation, representation aggregation, and joint representation learning. Here, we survey the state-of-the-art techniques on AI-based cyber-attacks and then go further to map them to the techniques used in monitoring and defending them. As a result, this will provide decision-makers with insight into AI-based attacks, their detection, and mitigation approach.


Attackers can utilize AI techniques to achieve attacks and in other cases, the attackers can exploit weaknesses of the AI-based techniques to successfully compromise a system. We categorized the attacks based on AI-supported and adversarial attacks. We summarize them in Table II and discuss them as follows.

Category Paper Attack type Technique Detection/Mitigation
AI-supported attacks [129]
Target discovery, Automated
spear phishing
Long short-term memor, and Markov
chains (deep learning)
Using a detection system by incorporating the new systetic
URLs
[63] Credential access Deep learning and GANs Defenses have not implemented yet
[147] Credential access

ML algorithm (Torch-rnn)

Use AI-based password brute force algorithms to prevent
users from choosing poor new passwords
[135] Deep Locker Deep neural networks No defense is described
[164] Credential access
Probabilistic context-free grammars [157]

and recurrent neural network model

Dynamic personalized password policy based on user’s
personality traits [56], interpretable probabilistic password
strength meters via deep learning [120]
[117]
Credential access
(credential tweaking)
A generative model-based on sequence-to-
sequence learning, A discriminative model
based on word embedding techniques
personalized password strength meters using NN based
word embedding techniques, password strength meters
[15]
[129]
bypass AI phishing
detection systems
Deep Neural Networks
Detection system by incorporating the new systemic URLs
[87]
concealment against antivirus,
disable countermeasures,
Unlocks malicious payload
Deep Neural Networks
No defense is implemented, however, the authors in [87]
recommended host-based monitoring, AI usage monitoring,
AI lock picking, etc
[168] Crowdturfing Attacks Recurrent Neural Networks
Using lossy transformation introduced by the RNN training
and generation cycle.
Adversarial attacks [62] Evasion/deception GANs
No defense is implemented but [62] have recommended:
fully homomorphic encryption [137], privacy-preserving
collaborative learning [132], differential privacy [1]
at different granularities.
[13] information leak Meta-classifier
[118] Deception against DNNs Crafting algorithm improve the training phase (e.g., multilayer feedback [64] )
[146]
model extraction attack
Generic equation solving attack for models
with a logistic output layer.Path-finding

algorithm with decision trees

Rounding confidences, differential privacy, ensemble
methods [146]
[68] Defense evasion GAN based algorithm
autoencoders to map adversarial samples to clean input data
defensive distillation [119]
TABLE II: AI-based cyber attacks, their techniques and mitigation

Ii-B1 AI-based attacks

In this section, we discuss attacks supported by AI, where advanced technologies are leveraged to power cyberattacks. Hitaj et al [63]

presented an approach to generate high-quality password guesses by automatically learning the distribution of real passwords from actual password leaks using deep learning and Generative Adversarial Networks (GANs). Their results showed that their approach surpasses rule-based and other ML password guessing tools, even without any a-prior knowledge on the passwords. Seymour & Tully

[129]

presented a program named Network Automated Phishing and Reconnaissance which is designed to exploit social media users based on two forms of deep learning models; long short-term memory (LTSM) and Markov chains. Specifically, the program targets vulnerable twitter users that are more vulnerable to social engineering attacks by analyzing their tweets and then based on their tweets it creates relevant replies with a shortened obfuscated link to achieve a phishing attack on the target. Stoecklin

[135]

presented an AI-powered evasive malware tool named Deep Locker that is able to conceal itself in other applications until it is unlocked based on certain trigger conditions such as geo-location, facial recognition, software, and user activity. It is difficult for defenders to determine the pattern of the deep locker and hence, it is complex to develop defense countermeasures for them

[87]. Rhodes [126] provided an approach to infect multiple systems using automated self malware propagation. The author also described that this propagation can be achieved a large number of systems while avoiding possible detection. Besides, other research work showed that AI-supported attacks can achieve faster and efficient attacks [133].

Ii-B2 Adversarial attacks

In this section, we discuss attacks on ML models. Hitaj et al. [62] implemented an inference attacks on deep neural networks in a collaborative setting. In particular, they showed how an attacker can exploit the real-time nature of the learning process, where an attacker deceives a victim into releasing more accurate information on sensitive data. In their work, they showed that a distributed, federated, or decentralized deep learning approach can be attacked and thus, cannot protect the participant information (from the training sets). Ateniese et al. [13] presented an approach to attack ML classifiers and other statistical information that can be revealed from them. The authors developed a meta-classifier that is trained to attack other ML learning classifiers to retrieve sensitive information or patterns from the training set. The authors demonstrated how their approach could obtain unauthorized participants’ information from trained voice recognition systems, that are not captured by privacy-preserving models or differential privacy. Papernot et al. [118] showed how an adversary can manipulate input fed to deep learning models that are used to infer and reveal the identity of object/persons in a blurred image. The authors showed that an adversary can make the model misclassify the input, thus producing incorrect outputs. Tramèr et al. [146] presented attacks against online services of Amazon and BigML, where the work showed that it is possible for an adversary with black-box access, but no prior knowledge of an ML model’s parameters or training data to steal ML models that are based on only predictions on input feature vectors. In addition, they showed that the natural countermeasure of omitting confidence values from model outputs still admits potentially harmful model extraction attacks. Hu et al. [68] proposed and generate adversarial malware based on GAN which is able to bypass black-box machine learning-based detection models.

However, several methods have been developed to mitigate both AI-supported and adversarial attacks. However, it is still challenging to implement effective countermeasures against the attacks, as the solutions proposed are still susceptible to other forms of attacks. Moreover, they have not been incorporated into SA systems. The Defense Advanced Research Projects Agency (DARPA) has utilized deep learning and neural networks over the past decades to develop machine-speed defensive systems capable of detection, evaluation, and patching vulnerabilities in real-time while probing the attacker’s system [34]. DARPA is still working on a program known as Cyber-Hunting at Scale (CHASE) to detect and characterize new attack vectors, collect relevant data, and deploy countermeasures [35]. It is a work in progress that seeks to develop an automated tool that will overturn advanced attackers based on both ML and cyberattack modeling tools.

Iii Data Gathering

The capability of a powerful SA system highly related to the quality of data collection. The system needs to collect information about the environment which is mainly from different sources. Then, it can help the system to make decisions based on the collected information and knowledge gained, and consequently response to the threats. Data collection can further improve the quality of knowledge to make better decisions for the future threats.

There are various types of data sources that need to be used for SA, we classify those types of data based on the different criteria such as availability, accessibility, complexity of use, and usability for SA system: dynamic, one-off, alert-based, intelligence sharing, and raw data. Each type is detailed as follows:

  • Dynamic: This type consists of the data which are produced by vulnerability scanning tools (such as NESSUS) or network data gathered through network topology or configuration. The data are usually updated depending on the type of network. However, dynamic data need to be updated frequently once the network components are changed. Changing the network components may change the vulnerabilities that should be captured again.

  • One-off: This is usually produced by reports from experts such as incident reports which are mostly static [45].

  • Alert-based: The examples of Alert-based data are those data produced by Intrusion Detection Systems or other Alert-based systems like Snort, Tripwire, and so forth [138].

  • Intelligence sharing: This can be obtained through communication with other parties or external threats intelligence which have the information such as Open Indicators of Compromise (OpenIOC) and the Malware Information Sharing Platform (MISP). They usually provide updated information on recent vulnerabilities and malware.

  • Raw data: Any other raw data can be categorized in this group such as packet sniffing, system log files, SNMP traps, traffic dumps, OS audit logs, firewall logs, etc [162].

However, the data types discussed above can be gained through various ways. We further survey how those data types can be collected and used for SA systems which includes the platforms, tools, and resources. Fig.1 demonstrates the multi-level data type pyramid model for SA.

Iii-a Honeypot

Honeypots are systems designed to deceive attackers into believing they are interacting with a real information asset in order to understand attacker behaviour and intentions. The main advantage of honeypots are that they minimise false positives because because a honeypot is not a production asset and so no legitimate user should be accessing it. There are some false positives caused by web crawlers or similar systems such as network measurement tools [123].

The application of Honeypots in designing and creating SA systems have been studies widely [16]. Honypots are used to provide a source of accurate, timely and concise information for SA systems. Honeypots can be used to capture large-scale malicious activity using the traffic inspection, and collect and classify data and fed into intrusion detection system to provide more precise perspective of the current situation of the proposed network. Barford et al. [16] proposed a daily network security monitoring system using honeypot to collect the data and further classify and summarize the data to provide ongoing SA. Thonnard and Dacier [143] leveraged malicious Internet traffic data obtained from a distributed set of honeypot responders (i.e. honeynets) to capture time series of attacks and further clustering of the attack patterns. Similarly, Chawda and Patel [30] proposed a distributed honeypot system aiming to monitor and detect new vulnerabilities.

Fig. 1: Multi-level data type pyramid model for SA

Moreover, in [89], the authors developed a honeypot system to acts as a malware data collector which is able to capture self-propagating malware and monitor their activity. Sun et al. [136] proposed a novel framework for modelling and clustering attackers’ activities using the data collected by world-wide scale honeypot. The collected data from honeypot are fed into further analysis module which uses Bayesian probabilistic graphical model and a graph-based clustering algorithm for classification of attacks and monitoring the attackers’ activities. Moreover, the application of honeypot in monitoring and detection of botnets has been studies in the literature [100, 44]. Internet of Things (IoT) development are exploited by the attackers as a viable attack sources for launching various extensive attacks through the botnets. Botnets are appropriate attacking points to launch a wider attack range to any system or network by exploiting the vulnerable IoT-based botnets [84]. For example, a Distributed Denial-of-Service (DDoS) attack can be launched using compromised IoT devices (botnets) to the cloud systems by flooding traffic packets from various sources causing service interruption for users. Honypots can help to capture malware activities launched from botnets. In [100], the authors utilized honeypot to monitor the traffic passing through the honeypot and botnets and extract the malicious botnet activities. Fachkha and Debbabi [44] studied the application of honeypots and darknet to monitor and detect various malicious threats launched from the Internet such as DDoS, Worms, and botnets.

Iii-B Intelligence Sharing Platforms

SA systems need to get data related to the threats and malicious activities. Information sharing is a an essential process in detecting security breaches and proactively protecting information systems and infrastructures. SA data (i.e. raw data) can be collected in both lower and higher levels, as data is converted to abstract information. However, the lower levels data can overwhelm decision makers’ cognitive capacities. Indeed, relying on low level data solely is obviously insufficient for situation awareness. For creating automated SA system, data gathering phase should use technical platforms, tools, standards, and secure information exchange protocols to get related data such as higher-level threat intelligence data [174]

. However, intelligence sharing tools such as Open Indicators of Compromise (OpenIOC) and The Malware Information Sharing Platform (MISP) or classic data sources like CERTs and Open-Source Intelligence (OSINT) can be used to receive those information 

[134]. To create a holistic SA system, a large amount of dynamic data needs to be monitored, refined and processed at real-time. Yen et al. [169] utilized a hypothesis-driven information gathering (called as “gathering of evidence”) to address the challenge of processing the large amount of dynamic data for building the SA framework.

Iii-C Vulnerability and Network Discovery

Vulnerability scanning tools can scan network devices and software as well as cloud infrastructure to reveal configuration errors, unpatched and vulnerable devices, and known vulnerabilities [4, 6]. SA can leverage the data collected from vulnerability scanning tools for more analysis. For instance, some vulnerabilities can be patched and some may need additional countermeasures. However, to create a real-time SA system vulnerability scanning tools should perform periodic or continuous scanning process. Moreover, Network scanning can be used to create SA. Network scanning can get network-based information such as devices, platforms, operating systems, and open ports and services, etc. which can be used for SA monitoring system. There are various vulnerability database and sources such as Common Vulnerabilities and Exposures (CVE), National Vulnerability Database (NVD), OSVDB444https://blog.osvdb.org/, X-Force555https://www.ibm.com/security/xforce, and BugTrack666http://www.securityfocus.com/. Vulnerability database mainly includes an unique identifier, description, publishing dat, vulnerability scoring system (such as CVSS777https://www.first.org/cvss (2018)), and some security metrics. Nessus, QualysGuard, MaxPatrol, OVAL, and GFI LanGuard, are some examples of tools which have been used in the studies to gain vulnerability information for SA systems [138].

Data Type Example of sources Pre-processing Source Pros Cons References
Dynamic
– vulnerability scan data
– network topology
Moderate I
+ Available
+ Include metrics
- Need Frequent update [7]
One-off data – Incident reports High I–E–H + More precise
- Less availability
- Need experts
[45, 82]
Alert-based
– IDS
– alerts (e.g., Snort alerts,
tripwire alerts)
Low I–H
+ No inspection required
+ Real-time (alert based)
- Inability to detect
unknown attacks
[138]
Intelligence
sharing
– CERT
– MISP
– OSINT
Low E
+ More precise
+ Updated Information
- Need external sources,
sharing, and trust,
so less available and slow
[82, 174]
Raw
– Packet sniffing
– system log file
– SNMP traps
– other like traffic dumps,
OS audit logs, firewall logs
High I–H + More available
- Changes Frequently
- Complexity i.e. parsing
and analysis
- Needs frequent inspection
[162]

Notations: Internal Sources (I) - External Sources (E) - Honeypot (H)

TABLE III: Data types classification based on various factors for SA systems.

However, network and system information such as topology, configuration, components, etc. are important in monitoring the network’s activities based on the available components and connectivity. Network flows can be collected and be correlated with security events. They may also be useful for attack representation techniques such as attack trees and attack graphs [90]. Various network discovery and mapping tools such as Lumeta IPsonar, SteelCentral NetCollector (formerly OPNET NetMapper), Nmap or JANASSURE are useful for SA and have been used in studies [138]. They screen the incoming and outgoing traffic in the network and monitor crucial files on the host operating system.

Iii-D Network Traffic Inspection

Network traffic inspection have been used in many studies to collect the required data for creation of SA systems. Vinayakumar et al. [152] conducted DNS data collection in a passive manner by using using promiscuous mode and reading the mirrored traffic on DNS communication between both DNS clients and servers. The data includes DNS queries and the DNS answer regarding each query made by the DNS communication between the client and DNS server. They used the collected data such as malware propagation and activities, the prefix announcements, route announcements and updates information to identify the malicious activities. Moreover, Antivirus software (AVS) which are traditional countermeasure can be used to collect data for SA creation. Many AVS can produce log data about detected malware and are able to generate log data about network traffic which can be utilized by SA. Wu et al. [162] conducted a data extraction methods based on the factors and rules which are related to the design of the proposed SA system for smart grid. The rules of extraction were defined based on the system’s requirements. Situational factors store the security related information and heterogeneous schemes according to the specified format. They collected the semi-structured and unstructured data required for SA using the basic situational factor collection such as network flow, access control operations, and device states. They performed data collocation based on network traffic inspection by inspecting Network Management Protocol (SNMP) data flows which manages TCP/IP network communications. Some tools such as Snort, TCPdump, Bro, Ntop, and WireShark are some examples of network traffic inspection tools which have been used in the studies to gain network traffic-based information such as log and traffic flows for SA systems [138].

Iii-E Intrusion Detection System (IDS) & Firewall Data

IDS plays an important role on monitoring the individual devices and the system’s network traffic. Various monitoring techniques can be leveraged by IDS such as log events monitoring and analyzing, and signature and anomaly based systems. For instance, packets IP addresses, action sequences (remote user logins and manipulating the critical files) can be the examples of signature-based monitoring. However, reference usage profile is categorized as the anomaly-based monitoring. For instance, when a normal user such as the office secretary uses the tools which are only used by admin stating team, it can be considered as abnormal usage behaviour and might be detected. Moreover, IDSs should be are able to comprehensively monitor traffic anomalies to gain a clear situation awareness based on the sources, destinations, and the amount of traffics. Moreover, the other aspects such as security policy violations and system integrity can also be detected by IDS. Some IDS such as TripWire, OSSEC HIDS, and Snort have been used in the studies for gain abnormal events on both network and individual hosts.

Iii-F Data Gathering Limitation

As the most essential part of SA system design, data gathering process is still crude and need further consideration. More cost-efficient data gathering and pre-processing need to be investigated. The limitations of data collection for SA are listed as follows [90], and also pros and cons of each are illustrated in Table III:

  • Different data types need various sources which increase the complicity of data collection and real-time monitoring capabilities.

  • Although honeypots and honeynets are crucial sources of gathering information about real threat scenario, most of the gathered data using Honeypots are unstructured and unorganized. It’s crucial to automate the data organizing and condensing of honeypots to be useful for further SA evaluation and real-time analysis.

  • Various sources provide different data representation and formats which increase the pre-processing burden. However, devising a standard and unique dataset which can incorporate data types from different sources into a standardized form is still missing in the literature.

  • Continuous growth of data volumes collected and stored for further analysis (i.e. using honeypots) may cause problems by overwhelming the system. However, a system should be designed to remove the unnecessary data and preserved some portion which may use for further process (such as learning purposes). Thus, a systematic data gathering is needed to extract only useful information from large traffic data sets.

Iv Analysis and Techniques

This section provides a comprehensive review and classification of the existing methods and techniques used to analyze cyber SA in various systems and contexts. Gaining a high level of situation awareness depends on three main hierarchical steps each of which provide a level of situation understanding from low to high. SA phases can be classified into three main categories including data gathering, analysis and techniques, and situation awareness demonstrated in Fig. 2.

Fig. 2: Situation Awareness hierarchically structure and the related framework showing the level of situation awareness.

He and Li [59] defined the three main layers of awareness for a comprehensive SA monitoring system as follows. The main task of the first layer is to monitor and analyze abnormal traffic of the network through network traffic monitoring and analysis tools and called it as data extraction layer. For this purpose, the alarm database can be used to extracted sample data and then the sample data can be processed. The second layer is the key layer of the system which is used to realize and evaluate the network security situation and determine whether if the network is attacked or not. Consequently, it processes and evaluate attacks based on the current system situation and based the evaluation model of the specific situation, it generates the security situation based on the corresponding graph to reflect the current status and security posture of the network. Finally, the third layer of the system can evaluate and predict the network security situation based on the second layer’s output. It predicts the network security situation, by obtaining the current network security situation and other network security data. This stage can help the security experts to be aware of the network security situation with more high-level information, and also help them to provide the basis for making reasonable decision.

Iv-a Data Pre-processing

All collected data, especially raw and unformulated data, need to be parsed, cleansed, normalized before feeding to the next step for further analysis. Indeed, the data pre-processing consists of various steps such as data cleansing including duplicate elimination, data normalization and collation. For instance, data cleansing may include data calibration and filtering process for the raw data collected from security sensors (IDS, network and system log records, firewall, SIEM, and NetFlow and so on. Data processing for SA system design and implementation has been discussed in various studies [144, 181]. Zhong et al. [181] conducted an extensive study on the data triage operations in situational awareness analysis. Their main area of focus are in cyber defense analysts based on the data triage and network monitoring data, their proposed framework helps to automate the retrieval of data triage process for analysts.

Data Refining and Fusion: Most of collected input data, especially for the raw data collected by IDS or network traffic inspection, log files, etc, as shown in top of pyramid in Fig. 1, need higher pre-processing burden. It mainly includes parsing and refining of raw data before feeding the refined data to analysis modules. For instance, Shen et al. [130] defined three levels for collected data: (i) Object refinement which mainly are pre-process and refining raw data collected through various sources such as IDS, system and web log files, as a level-1 data fusion, (ii) Situation refinement or level-2 fusion which deals with pre-processed data resulting from level 1, and (iii) Threat refinement or level-3 data fusion which consists of high level data resulting from level-2 data fusion. This hierarchical perspective of data fusion is a part of data processing to determine final situation awareness and designed impact assessment system. Similarly, [173] proposed a data fusion model for Resilient Control System which aggregates data form various IDS sources and pre-process them using two levels as object refinement which uses a combination of IDS to identify an individual attack. Then, the results from the object refinement are fed into the situation refinement which can find simple attack characterization by fusing the defensive posture. Later on, the pre-processed and fused data from those two levels should be fed into higher analysis based levels such as threat refinement. In [141]

, the authors proposed a suitable data fusion algorithm for SA aiming to reduce network traffic redundancy in big data through feature extraction, classification, and integration. Feature extraction is The key component of the data fusion algorithm which helps to address dimension complexity. The features can show the original big data through analyzing the internal characteristics.

However, cybersecurity problems are, often, compresses a very large set of features. Utilising all the features has two main drawbacks, firstly, more examples will be needed to build/train a model, and secondly, not all features are relevant or needed. Hence, performing feature selection and feature construction can largely help to mitigate this problem and reduce the dimensionality of the feature space. Furthermore, the vast majority of those features are handcrafted by domain experts. Seeking an automated approach to extract features from the raw date or building a model that is capable to operate directly on the raw data can reduce the effort and costs associated with manually designing and extracting those features.

Iv-B Using Artificial Intelligence

Iv-B1 Machine Learning

In computer science, machine learning (ML) is a subfield of the wider artificial intelligence field (AI) [128] and one of the most rapidly evolved fields [9]. ML comprises a set of algorithms that aim at automatically, i.e., without being explicitly programmed, extracting useful patterns or knowledge from data to solve a problem [10]

. Generally, ML methods can be categorised into four approaches: (1) supervised learning, (2) unsupervised learning, (3) semi-supervised learning, and (4) reinforcement learning

[158].

  • Supervised Learning: The methods of this approach relies on labelled examples, i.e., the correct output (also known as ground truth) is provided, and are guided by those labels during the training process to learn a model. Hence, such methods aim at generating a generalised mapping between the inputs and the corresponding label [20]. The two typical tasks of this approach are classification and regression.

  • Unsupervised Learning: The methods of this approach concern with unlabelled example inputs and therefore, the main aim is to utilise predetermined criteria to group those examples into different groups. A typical task of this approach is clustering.

  • Semi-supervised Learning

    : The methods of this approach combine the both of the previous two approaches. Such methods are needed where the problem at hand comprises a mixture of labelled and unlabelled data. Transductive support vector machine is a typical example of this approach

    [78].

  • Reinforcement Learning: The methods of this approach aim at developing an agent that can automatically explore an environment and takes an appropriate action. The learning process in of such methods rely on maximising a cumulative reward and minimising the penalty.

Adenusi Dauda et al. [3] developed a threat detection model aiming to gain cyberspace condition. They proposed a SA model using Artificial Intelligence (AI) technique. They used Artificial Neural Networks (ANN), also called (NN), to create perception sub-model. Moreover, they leveraged Rule-Based Reasoning (RBR) techniques to model SA comprehension and projection phases. Champaneria and Panchal [28] proposed a model for detection of novel and unknown attacks. They developed an intrusion detection model using hybrid Artificial Neural Network (ANN) approach. They showed that the proposed hybrid ANN model outperforms the other methods in terms of attack classification, training time, and detection rate.

SA prediction is one of the crucial purpose of SA system. Various methods of network security situation prediction have been proposed based on different ANN models such as Back Propagation (BP) Neural Network (BP-NN) [175, 139]

, Radial Basis Function (RBF) Neural Networ 

[175], Elman Neural Network (ENN) [176], and etc [110]. For instance, Tang et al. [139] proposed a network security SA prediction method based on dynamic BP neural network using covariance. They improved their method by including self-learning dynamic adjustment of the parameters’ weight.

Zhang et al. [175] proposed a prediction-based SA based on two neural network models: BP-NN and RBF-NN. They also compared those two methods and showed that BP-NN model is more effective than RBF-NN model to predict network SA. Zheng et al. [179] proposed a self-adaptive and real-time SA strategy named Network Security Situation Autonomic Awareness (NSSAA). They adopted a BP-NN model to realize self-learning adjustment of input data.

A hybrid ANN models for SA is proposed by [176]. The authors combined three NN models which are BP-NN, and RF-NN together for predicting SA for a computer network. They showed that the combination of NN models can yield better and more efficient SA prediction.

However, in most of proposed ANN models, the authors performed error analysis and predict error values, the error values later used for training purposes of prediction models. Thus, the improvement on the models could only achieved by previous prediction errors fed into the prediction model as a training sample. Moreover, the success of the ANN models highly depends on training sample, algorithms, and the quality of training [96].

The application of supervised learning for SA prediction has been studied in [31]. Chen et al. [31] used echo state networks (ESNs) as a supervised learning method with small world property to propose a network security prediction method by training the historical attack records. Moreover, in [66], the authors proposed a method for SA prediction model based on a support-vector machines (SVM which can learn based on training of large volume of input data using KDD dataset. They showed that the model effectively reduces the SVM training time while it enhances the accuracy of SA prediction.

Vinayakumar et al. [153] used different deep learning architectures to be able to detect spam and phishing attacks using Uniform Resource Locator (URL) and email data sources due to the importance of Email and URL resources which are used by the attackers to spread malware. They used various datasets to conduct their experiments using deep learning architectures. They used classical machine learning algorithms for comparative study and collected required data using public and private data sources. Vinayakumar et al. [152] proposed a scalable framework for situational awareness for networks which is able to perform web scale analysis in near real-time manner and detect threats and emit early warning signals to avoid malware propagation and large scale attacks. They employed deep learning approach to correlate malicious activities obtained from the DNS protocol usage. Dietterich et al. [38] used machine learning methods to capture the behavior of ordinary desktop computer users.

However, the SVM-based approaches is effective for SA modeling and analysis including monitoring and prediction of SA. However, most of these models suffer from the long training time mainly for SA prediction model which is the main drawback of these types of ML-based approaches [67].

Iv-B2 Evolutionary Computation

Evolutionary computation (EC) is concerned with biologically inspired algorithms [14]. EC algorithms can be widely categorised into two groups [14]

: Evolutionary Algorithms (EAs)

[80]; and Swarm Intelligence (SI) [40, 178]. EC algorithms are considered to be global searchers as they rely on a population of candidate solutions unlike other ML algorithms that search the space using a searcher.

Liang et al. [101] proposed a SA system based on incorporation of evolutionary algorithms and neural network models. They used the evolutionary algorithm to optimize the parameters of neural network model and finally quantify the network SA. However, their model could only analyze limited situational factors which are mainly based on SA perception. Later on, Lin et al. [102]

proposed a SA model by incorporating BP Neural Network and Particle Swarm Optimization (PSO) to predict future situation and projection. The incorporation of PSO into the model can provide global optimization solution. However, it lacks generalization for new samples and only relied on trained samples. Similarly, 

Meng et al. [110]

proposed a SA network security prediction method based on combining RBF-NN with hybrid hierarchy genetic algorithm.

Li and Liu [97]

proposed a SA extraction method based on the improved particle swarm optimization (IPSO) and logistic regression algorithm (Logistic Regression LR) which can find global optimization and improve the learning speed and accuracy because of the intrinsic parallelism and optimization capabilities of IPSO. Moreover,

Zhao and Liu [178] proposed a SA system based on adopting PSO into wavelet-NN model for monitoring large scale data environment and find the optimal solution. Finally, they showed the effectiveness of combining ANN models with PSO methods to achieve faster and more accurate network situation awareness.

Moreover, Combining ANN methods with EC-based methods such as PSO can enhance the SA system in terms of multiple factors such as NN ability to learning, speed of learning, accuracy, and effective solution, while it covers the limitations of traditional NN algorithms such as network training errors and low search success rates [178].

Iv-B3 Limitation of AI-based Techniques

Although ML-based and EC-based approaches are useful in terms of self-learning, automation capabilities of SA , and the ability of combining with other methods, there are some main limitations summarized as follows:

  • Dependency to updated dataset which includes data for novel and unknown threats for training purposes. However, the role of honeypots for capturing real and novel threats is inevitable to address this shortcoming [25].

  • Lack of adequate training samples or appropriate training models may cause undesirable results [96]. Trained samples should be productively used as the input of model to be able to capture or predict the incoming security situation. Evolutionary Computation methods mostly need a large amount of prior knowledge to extract the situational elements which might be difficult to obtain.

Iv-C Game Theory

Game theory (GT) is related to the mathematical models that can be used in cyber SA to study the game behaviors between attackers and defenders [21]. In here, we highlight the GT-based approaches regarding either designing or applications of game theory by reviewing the literature to provide the current state-of-the-art of GT in SA systems and it’s limitations.

Indeed, the main goal of the game-theoretic SA approaches is to predict the adversary behavior against defender. This prediction can be used to provide an advantage to the defender [7]. Through game-theoretic analysis, the defender can theoretically prove the attacker’s best strategy; consequently, the best defensive strategy can be used. Several game-based security awareness methods have been proposed [174, 177, 99].

Various studies utilized traditional GT-based approach for SA and monitoring systems. Shen et al. [130]

proposed another Markov Stochastic Game Model for designing SA system which is able to estimate and detect cyber attack pattern based on the collected data. They showed that their proposed GT-based model can enhance the understanding of the network situation and help with proper defense. 

Wang et al. [154] proposed a stochastic game theory model for quantification of network situation awareness. They used the network offense and defense game based on the network service states to realize the payoff of both sides and quantify the situation. However, applying the game theory based on the network state spaces has scalability issue as the network state spaces could be extremely large, especially, in dynamic networks and the solving the state combination problem would be time-consuming. Later on, a stochastic game theory for SA was proposed by [171] which could capture larger network states in a dynamic network environment. However, it still suffers from scalability problem in the larger network states to solve state combination problem.

The applications of GT-based evaluation which can be utilized for SA models have been studied in the literature [29, 5]. In [159]

, the authors implemented a GT-based model in order to analyze several attack scenarios on Online Social Network (OSN) to obtain clear perspective of system’s vulnerabilities against various attacks, and consequently, provide protection mechanism to avoid the attacks. They applied Markov decision process (MDP) in their GT model to secure information sharing in online social networks.

Zhang et al. [177] proposed an approach to improve SA based on the Markov Game Model (MGM) by gaining the data regarding threats, assets, and vulnerabilities and evaluating them in real-time. In their model, users, network administrators, and attackers establishes three players for MGM. They showed that the evaluation result is efficient and precise. Moreover, Zhang et al. [174] leveraged a game-theoretic approach to defend threats in cloud environment using threat Intelligence. They used the Nash equilibrium together with fuzzy optimization method to predict the attack behavior. Ying et al. [171] proposed a game-theoretic based dynamic SA system by modeling both attacker and defenders and game player using stochastic game model of the network. Then, they quantified the network SA by incorporating the game mathematical formula, attacker and defender costs. They used Nash equilibrium to find a balance between attacker’s and defender’s benefits.

However, GT-based approaches have some limitations such as lack of full rationality of the involved players which are attackers and defenders and incompleteness of information [32]. When it comes to advanced adversaries, this problem even get worse as the intelligent attacker may learn about the defender and the environment. To this end, GT-based approaches need to be incorporated by learning approaches and knowledge awareness to be able to increase monitoring and defensive capabilities against advanced adversaries.

Iv-D Hybrid Approaches

Various studies have applied multiple approached for designing SA system such as combining learning-based techniques with GT [162, 165, 155, 165]. Wu et al. [162] proposed a security situational awareness using Fuzzy cluster based analytical method, game theory, and reinforcement learning mechanism. They analyzed the big data in the smart grid aiming to gain the security situational awareness in the smart grid environment. Xing-zhu [165] proposed a method to determine the situation of the system based on Fuzzy Dynamic Bayesian network. Their simulation results were compared with static Bayesian network model. They showed that the proposed method can better reflect the dynamic changes on the network. In [3]

, the authors used the fuzzy equivalent relations in order to perform cluster analysis together with the association analysis the situational factors in big data.

Moreover, Chung et al. [32] proposed a hybrid approach incorporating GT and a model-free reinforcement learning (called Q-learning [155]) to enhance the monitoring and defensive capabilities of the designed system and support the problem created by GT-based approaches which are mainly lack of information about the ability or intent of attackers.

However, there still lack of appropriate training dataset for learning-based approaches which have been various application when they are combined with other approaches such as GT and Fuzzy models. Thus, it is necessary to provide further study on improving learning-based and hybrid approaches using real data, and to test the approach with real-time traffic logs, data, and specific detection using a real adversary data collection testbed such as Honeypots.

However, the SA monitoring quality on effectively detecting and analysis of system’s environment is still limited due to various reasons including, (i) inaccurate and incomplete data gathering and pre-processing, vulnerability analysis, intrusion detection, (ii) the ability to quickly and automatically adapt to the evolving and dynamic nature of networks, (iii) the capability to deal with intelligent attackers and AI-powered threats (such as sophisticated and complex attacks), (iv) limited capability to deal with uncertainty, and (v) limited capability for large-scale real-time data collection system.

Iv-E Triage Analysis

Much like the triage in medical use, the triage used in the cybersecurity refers to a set of automated techniques with the capability to quickly assess a security incident to determine if the security incident requires further investigation. The term triage has especially become popular in malware detection as an efficient mechanism which can analyze and identify specific malware that require urgent attention from the massive amount of malware for many organizations with limited resources. By and large, most triage techniques that are currently on offer goes through a number of phases, including (1) feature extraction, (2) similarity measure, and (3) clustering. Most often, features are extracted from malware. The features are often clustered according to the result obtained by a similarity metric. The features from the different cluster are compared with known signatures. If no match is found with the known signatures, they can be often classed as potential zero-day attacks. The zero-day attacks are typically highly ranked for further investigation.

The current triage techniques can be broadly classified into two groups, the ones that deal with malware features extracted as categorical data while the other deals with malware binary files. BitShred [74] and VILO [93]

extract features based on N-grams that have been used in text analysis. BitShred further use a feature hashing 

[131] on the extracted feature to allow for dramatic dimensionality reduction to compress large feature space down to a smaller feature so the hashed presentation of features take less space in memory and more effective for cache. The hash features are compared using the Jaccard similarity metric in BitShred while VILO utilizes a weighting scheme [79] by calculating how frequently a word (i.e., feature) appears in the feature vector and comparing the similarity in the weights. BitShred uses co-clustering method to correlate both the hashed features and malware samples which are claimed to discover more substantial, non-trivial structural relationship among malware samples. VILO uses the nearest-neighbor algorithm based on the weighted similarity scores to form clusters. Instead of N-gram, MAST [27] extracts features based on the qualitative data that represents each mobile app (e.g., permissions, intent filters, the presence of native code, etc.) – this is named as questionnaire. The similarity calculation among the collected questionnaire is done using a statistical method called Multiple Correspondence Analysis (MCA) [2] that measures the correlation between multiple qualitative data followed by grouping related mobile apps together (i.e., clustering) so that uniqueness within the group is more specific. In contrast, SigMal [88] uses the signal processing-based feature extraction where the executable binary content as a one-dimensional signal that is represented as a vector of bytes. The vector of bytes is converted as filtered feature vectors [114]. The authors claim that the use of the filtered feature vectors based on the executable binary content is better equipped to preserve the features of the original malware sample even though the malware is disguised by polymorphic engines or general packers (e.g., encryption or compression techniques). Euclidean distance metrics is used between feature vectors to find the nearest-neighbor sample in the learning dataset.

The main point of research in triage techniques is either to improve detection accuracy which is decided by the algorithms utilized for similarity metrics and clustering or to speed up computation to filter through as many samples as possible. For example, BitShred showed that the proposed method speeds up typical malware triage tasks by up to 2,365x and uses up to 82x less memory on a single CPU thus more suited for large-scale malware triage and similarity detection. The results from VILO presents that there was in between 0.14% and 5.42% fewer mis-classification compare to similar methods. MAST was able to detect 95% of malware from the 36,710 mobile apps as test samples. SigMal could classify 50% of the incoming sample with above 99% precision and showed that it could have detected, on average, 70 malware samples per day before any antivirus software detected them.

Iv-F Anomaly Detection

Anomaly detection can be incorporated with SA to detect abnormal behaviour of a system components such as users, traffic, access, etc. and provide the system with more clear idea about the current situation of the system based on normal and abnormal activities. Actually, abnormal behaviors are the activities inside a system which are opposite of the normal or logistic behaviors. Thus, the core of anomaly detection is established based on the monitoring of normal system operation to find out any deviation from the normal model [98].

Friedberg et al. [51] distinguished three kinds of anomalies including (i) point anomalies which involves with a single event that can be considered anomalous given the notion of normality, (ii) contextual anomalies which refers to when an event can be considered as anomalous behavior for a given context. Thus, the anomaly can be inferred using the the events’ behavioral attributes in its context, (iii) collective anomalies which indicates a series of events which are considered as anomalous activities. However, in order to precisely assess the current situation of the system, the SA system should be able to detect all those three discussed abnormality behaviors using anomaly detection methods embedded in the SA system. The application of anomaly detection in SA has been studies in the literature [58, 98, 108]. For instance, Harrison et al. [58] proposed an anomaly detection method to detect low probability events for SA.

Iv-F1 ML-based Approaches

ML-based approaches have been significantly used for gaining situation awareness through anomaly detection [108, 127]

. Various ML techniques have been used such as the symbolist approaches using random forests 

[108] and decision trees, or connectionist approaches leveraging neural networks (NN) [107], or evolutionary approaches which mainly mimic genetics or the immune system [36]. Moreover, other techniques such as Bayesian methods [150] and analogistic approaches using support vector machines (SVM) [111] have also used in the literature. However, learning based approaches need to use existing dataset for training and testing purposes. We further explain two popular dataset used for this purpose.

Datasets: Most of the learning based anomaly detection techniques use two datasets such as KDD-Cup 1999 dataset and NSL-KDD dataset which are more popularly employed as training and testing datasets.

  • KDDCup 1999 [142]: This dataset is a popular dataset which has been widely utilized for intrusion detection and anomaly detection methods. The training dataset includes around 5,000,000 single connection vectors which contain 41 labeled features as two types of normal or an attack. The features labeling as the Attacks are based on four categories such as DoS, User to Root Attach(U2R), Probing, Remote to Local Attack (R2L).

  • NSL-KDD [83]: NSL-KDD dataset is another dataset which has been used in ML-based methods. This dataset addresses the shortcomings of the KDDCup 1999 dataset. The KDDCup 1999 dataset includes a large amount of redundant or duplicated data records which are around 75% and 78% in both testing and training dataset respectively. This redundancy could make the learning algorithm bias and cause wrong results. To address this problem, NSL-KDD is adopted as the new version of KDDCup 1999 dataset and widely adopted for anomaly detection.

Feature Manipulation:

Feature manipulation is an important data pre-processing step for anomaly detection, especially for classifying high-dimensional data. It mainly refers to the process of transforming the input space of a machine learning task aiming to enhance quality and performance of learning-based techniques such as Machine Learning (ML). Feature manipulation concerns with feature selection, feature construction, and feature extraction. Feature selection aims at selecting a subset of the original features by removing irrelevant, redundant, and noisy features

[166]. Feature construction aims at generating a new feature or set of features by considering various combinations of the original features [115].

Fiore et al. [46]

proposed a network anomaly detection in a semi-supervised fashion based on the Discriminative Restricted Boltzmann Machine (DRBM) 

[95]. They used DRBM to capture the main aspects of the normal traffic class and further perform accurate classification. Another learning-based technique was developed by Salama et al. [127]

. They conducted research for the anomaly intrusion detection scheme using deep learning methods called Deep Belief Network (DBN) 

[37]

. They leveraged SVM classifier together with DBN used for feature reduction and called a hybrid scheme of DBN and SVM. Their hybrid methodology includes three main phases which are pre-processing, DBN feature reduction, and classification.

Iglesias and Zseby [71] proposed an anomaly detection for network traffic based on the feature selection approaches. They utilized a multi-stage feature selection method using filters and step-wise regression wrappers. Then, more advanced anomaly detection model based on deep learning was proposed by Javaid et al. [76]. They developed an anomaly detection-based system (ADNIDS) to detect unknown future attacks using deep learning approach. They introduced two main steps for feature extraction to collect unbalanced network traffic data and supervised classification to use the extracted features to label traffic dataset. They used NSL-KDD dataset for training data and evaluated the performance of the approach using some metrics such as accuracy, precision, recall, and f-measure values.

Further in [86], the authors proposed a hybrid anomaly detection model based on incorporating Long Short Term Memory (LSTM) and Recurrent Neural Network (RNN). They trained their proposed deep learning model using KDD Cup 1999 dataset and showed the effectiveness of their approach to detect the attacks in comparison with other learning based anomaly detection techniques. Similarly, Tang et al. [140] utilized Deep Neural Network (DNN) model for anomaly detection in Software Defined Networking (SDN) environment. They used NSL-KDD Dataset for training their deep learning model and showed the effectiveness of DNN to monitor and detect various attacks in the SDN environment.

Iv-F2 White-list and Black-list Analysis:

Blacklisting is a classical approach helping a monitoring system to detect malicious activities by maintaining a list of known blacklisted threats or activities. In contrast, white-list technique is used to gather and classify the information of reliable sources for legitimate uses [153]. The application of white-list or black-list techniques have been studied in the literature [43, 22, 112]. In [112], the authors proposed an authentication-based approach which classifies the access to the Uniform Resource Locators (URLs) based on three defined lists: white-list, black-list, or gray-list. They showed that the proposed system can improve the accuracy of the suspicious the gray-list, white-list, and black-list, and further reduce the authentication frequency for the user accessing the URLs.

However, there still a limitation for the accurate classification of those lists. For instance, a legitimate URL may be misclassified as a blacklist or vice versa. Moreover, most of the white-list and black-list approaches need frequent updating as the thousands of emerging threats evolve every day, and updating this list would be challenging [73].

  • DNS-based Black-list: Domain Name System (DNS) can contains resource records for the identification of hosts presented in the black-list and uses the DNS protocol.

  • Botnet Detection: Prieto et al. [122] proposed a Botnet detection system called as Botnet Detection System (BDS) which includes the network tools such as wget, Net-Whois, dig, and perl script to analyze the DNS traffic. They used a test-bed system which was infected with Zeus, Conficker and Kraken botnet to obtain the Black-list data.

  • Firewall and Access-list: In [48], the authors defined a black-list for smartphones to avoid sending and receiving traffics to a known malicious host. They also defined a white-list for the legitimate apps allowed to connect to the network connection.

Iv-F3 Endpoint Protection:

Endpoint protection is used to describe a set of security solutions designed to secure endpoints or entry points of end-user devices (e.g., laptops, tablets, smart phones, and other wireless devices) that are used to connect to the organization networks. In recent years, the organization has increasingly contended with not only growing number of endpoints but also a rise in the number of types of endpoints (e.g., IoT). Compounded by remote work and BYOD policies, these factors have created more wide attack paths making endpoint security more difficult and traditional firewall and antivirus-based approaches increasingly insufficient.

Endpoint protection can be characterized to attempt (1) securing the entry points of end-user devices, (2) protecting endpoints on a network or in the cloud from threats. For the former, a number of user authentication mechanisms have been proposed to ensure only legitimate end-user devices are connected safely to organization networks. Mutual authentication888https://tools.ietf.org/html/rfc5246 supports a mechanism where both entities (i.e., a client and a server) authenticate each other, either based a certificate exchange or username/password verification. Open Authorization (OAuth)999https://oauth.net/2/ has become one of the most popular and widely used authentication mechanisms on the Internet as it allows a federated user authentication where a client can use a single authentication token to assess a several organizations across trust boundaries. With the growing concern on IoT devices increasingly connected to organization network (via Fog or Edge computing), many monitoring systems incorporate capability to authenticate IoT devices. Almadhoun et al. [8] proposed a decentralized and scalable authentication approach that utilizes blockchain-enabled connectivity to Ethereum smart contracts where access tokens to communicate to the organization network server are issued by the smart contracts with no intermediary or trusted third party by effectively removing the overhead and expense associated with the third party solution. Advanced biometric-based approaches to bind an end user with his/her registered mobile devices (e.g., Apple iPhones) to generate a device unique “fingerprint” and use the unclonable fingerprint to authenticate with the server has been proposed [57, 180]. For the latter, NICE [85] uses low-level network switch properties to locate and map all the switches on a subnet and then associate rogue systems with specific physical switches. This is done automatically without relying on traditional network management tools and protocols (e.g., SNMP) which typically presume some prior knowledge of the network topology and often require administrative credentials. A number of agentless cloud computing endpoints monitoring has been proposed to support monitoring capability at the cloud to inspect and analyze endpoints attempting to connect to cloud services without having to stall the software on every user device [18, 24].

Limitation of Anomaly Detection. Although anomaly detection techniques are useful to discover novel and unknown attacks, there are still some challenges in terms of training and learning capabilities of those techniques. For instance, the network traffic is very complex and unpredictable especially in a dynamic environment. Thus, the model is subject to changes over time because anomalies are continuously evolving. Due to the changes in attack techniques and patterns, the information gained (trained) previously may be invalid.

Iv-G Current Tools

In this section, we provide an overview on the tools and prototypes which have already been used in design and development of SA monitoring systems. It includes the data collection, pre-processing, processing, and more comprehensive analysis which can be further used in SA systems. We discuss the existing SA related frameworks, prototypes and tools which have been implemented using research projects and real-world systems.

Iv-G1 Log file collector and analysis

Event logging and network traffic analysis tools play a crucial role in designing and developing SA systems. In here, we summarize some of them used for SA systems.

  • SEC: Simple Event Correlator (SEC) tool processes the text lines in the log files aiming to detect the certain event groups over the defined time window [149]. It can analyse and find the frequent patterns from the log files using the data mining algorithms such as breadth-first event log detection methods. SEC have been be used for designing SA monitoring systems in both data pre-processing and analysis phases [148].

  • NTE: Network Traffic Exploration (NTE) [151] is a security event packet analysis tool which can be use full to monitor the network traffic, analyse them, and detect various network attacks. It can be leveraged by SA monitoring systems tool either collect the network traffic information or detect the attacks using the pre-defined algorithms.

  • CogLog: Cognitive Case Log (CogLog) The CogLog is a semantic tool which can keep a log of findings of the given investigation. CogLog has been used in the SA monitoring studies such as [22, 26].

  • PANOPTESEC: PANOPTESEC is a tool that manages the system’s architecture and knowledge based on the security events and existing vulnerabilities. It can collect and further correlates the log files and alerts to detect attacks. The processed information can be returned to the users or system administrators. PANOPTESEC has been used in the SA monitoring system in the pre-processing and analysis phases [12, 75].

  • NECOMA: This is a designed tool that collects the network traffic data from the network devices such as switches, routers, and IDS. It further analyzes the collected data to identify any attack attempts and mitigate the attacks [75].

Iv-G2 Attack Graph generator tools

  • NetSPA: Network Security Planning Architecture (NetSPA) [160] is an Attack Grapph (AG) generator and reachability analysis tool which consists of graphing subsystem component to visualize the computed attack graph. It can provide the assessment component of the survivable system. To address the scalability issue, it uses methods to prune the graph and make it simpler by removing the paths that do not reach the goal. NetSPA has been utilized by various studies for network monitoring and security evaluation systems such as [116].

  • GARNET: GARNET [161] is an extended version of NetSpa which is able to capture the reachability of physical and logical topology by leveraging a graph subsystem based on tree maps. It is also able to evaluate the actual network situation through the interaction with the system [11].

Iv-G3 Threat analysis tools

The tools discussed before were limited to the capturing and analyzing network flow information and lacked the capability to monitor and detect threats and vulnerabilities [94, 170]. However, various tools have been designed and developed to monitor and analyze the threats and obtain high level of situation awareness such as perception and projection [163].

Fig. 3: CNSSA framework developed by [163].
  • CNSSA: Xi et al. [163] developed a real-time situation awareness tool named Comprehensive Network Security Situation Awareness (CNSSA) which is enable to monitor the network environment based on the collected data and quantify the network situation awareness based on four metrics: Security, Threat, Vulnerability, and Stability. CNSSA architecture is presented in Fig. 3. CNSSA is equipped with a useful visualization and monitoring module which is able to illustrate network situation based on detailed multi-level view with various threat viewing features.

  • Sol: Bradshaw et al. [22] developed a cyber security situation awareness framework named as Sol. It analyze the cyber situation awareness using mutli-agent environment.

  • PERCIVAL: Angelini et al. [12] proposed a novel visual analytics environment that obtains situational awareness providing the users with the understand of the network security posture and help them to monitor security events such as reactive and proactive attacks that are happening on the system.

  • MAD: Angelini et al. [11] developed a Multi-step cyber Attack Detection (MAD) Visual Analytics solution aiming to improve the network security by analyzing the possible attacks and identifying suitable mitigation techniques.

  • Vulnus: Angelini et al. [11] designed a visual analytics tool named VULNUS for dynamically inspecting the vulnerabilities spread on networks which helps to understand the network situation awareness. The proposed tool can visually classify nodes according to their vulnerabilities and compute the approximated optimal sequence of patches able to eliminate all the attack paths and allows for exploring sub-optimal patching strategies.

However, most of the existing tools cannot still capture more advanced threats mainly because of lack of appropriate (i) data collection module for collecting information related to advanced threats and (i) analysis module for evaluating the collected data and information to be able to discover advanced threats.

V Situation Awareness

V-a Threat Evaluation

Threat Evaluation in SA falls into a layer between cyber situation comprehension and projection, as it needs to provide higher perspective of the current situation of the threats based on its current perspective and its future impact. The more comprehension gained in this layer, the more SA level understanding could be obtained.

V-A1 Damage Assessment

The ability of being aware of the impact of the attack and threats, and vulnerability analysis is the main part of Damage Assessment (also called Impact Assessment) [103]. The appropriate cyber situation awareness can help the security experts to make the right defense decisions and take select appropriate defense actions. The security analysts need to perform the three basic awareness stages (Situation Perception, Situation Comprehension, and Situation Projection) to gain enough cyber situational awareness under severe cyber attacks. Damage assessment is an essential component of the impact assessment and situation assessment in the situation comprehension stage and has been studies in various research [125, 103]. Predictive damage assessment is an important part of situation projection which evaluate and analyze the damages which are going to be caused in (near) future which is missing in the current literature.

V-A2 Attack Tracking and Prediction

Network security situation evaluation methods based on attack intention recognition have been studied in literature [91, 92, 54]. Kou et al. [91] proposed a method to recognize the attack intention on a network based on achieved attack phase and vulnerabilities in order to trace the next attack phase. They formulate the security situation () as Equation 1.

(1)

where, denotes the effect of each attack path on the network security situation which can be calculated based on the probability of attack stage with the attack threat and some weighted values. Then, the predicted security situation () is defined as Equation 2.

(2)

where is the quantity of attack path, and is the effect of the attack on the next attack stage. The proposed method can evaluate the network security situation based on attack intention and stage recognition. This technique is used as situation projection to further predict the next attack stage. However, their proposed technique only works based on the known attacks and cannot be evaluated based on unknown or new-type attacks.

Furthermore, Hu et al. [65] proposed a comprehensive situation prediction model based on the overall network situation factors such as attacker, defender, and environment to show the adversary characteristic. Their proposed solution incorporates some important factors such as attack intention recognition, path detection, and success probability prediction. They further evaluated the threats using calculation of threat severity of critical assets and control the security situation. However, achieving those factors provide a high level of SA projection which are useful for further decision making and response. Similarly, various threat evaluation methods have been proposed for different purpose such as attack speed prediction [50], attack capacity inference [104], attack goal identification [50, 104, 113], attack path prediction [113, 50, 104], success probability prediction [124, 104], and attack time prediction [65].

V-B Decision Making and Planning

The SA system should be able to interact with situation response in which a planned course of actions needs to be taken [17]. Thus, before deploying any planned action, a decision making based on the consequence of the planned action should be done. SA enables a decision maker’s awareness of a situation and their understanding of the situation up to the point the decision is made. Once a decision is reached, planning and execution (of the response actions) occur.

Fig. 4: Decision making for SA based on vulnerabilities, attacks, and defense metrics [121].

Making an appropriate decision for a given system can be determined based on the system’s situation. Pendleton et al. [121] defined the evolution of the situation for a given system based on a function of time which can be determined based on three attributes each of which can be represented as a function of time. They formulated the situation by defining the situation metrics by contributions of vulnerability, attack, and defense metrics, as shown in Equation 3. The relationship between those aspects is described as a model in Fig. 4.

(3)

where represents a function of existing vulnerabilities in the system at a given time , and and denote the functions of defense and attack for time . However, there still a preliminary progress towards explicit representation of the various kinds of required functions, , corresponding to different situations and various attack-defense interactions.

Moreover, the decisions making in SA need to be evaluated based on the current and further security postures. For instance, the evaluation should be provided to show whether if the change in security practices cases negative impacts for further situations or not. This may include monetary costs, reputation damage, or so. However, it’s important to be informed by an accurate understanding of the risks caused by a selected decision [156].

V-C Visualisation

Visualization plays an important role in SA monitoring. It is a way to demonstrate the level of current threats, impacts, priorities, and sensitivity of analyzed data in the SA systems, and can be an interaction between computational process and human-based visual representation. This section discusses various visualization techniques used for SA such as statistical, historical, near real-time, and real-time presentations of SA system. Various existing SA tools use visualization techniques such as map-based, chart-based, network graph, line-charts, and flow diagrams to present information [61, 172]. Visualization techniques for cybersecurity purposes can be demonstrated based on statistical summaries which include some visualization techniques such as histogram, 2D or 3D graphs, and line-charts. Moreover, map-based visualization techniques such as geo-locations views are useful tools to represent the cyber attack source and targets situations. Fig. 5 shows some examples of this type of visualizations. Line-charts visualization methods are useful for monitoring capabilities especially in real-time manner such as real-time sliding slice. In [47] real-time monitoring feature is used for visualizing Feature Selection.

Fig. 5: (a) Geo-location visualization of many-to-many attacks [172], (b) An overview of the network graph visualization used in the literature [109], (c) demonstration of chart-based visualization techniques.

Many studies have been proposed to develop the system design for analysis of data and visualize the threats aiming to support SA in real-time monitoring [52, 106]. For instance, Mansmann et al. [106] proposed a screen-filling technique providing the events’ details in a data stream in near real-time by following the history trends of the prior events. In another work, Best et al. [19] utilized a behavioral modeling method which is able to learn the expected activities on a network. They presented a visualization system which combines various visualization techniques and tools, and provided the situational understanding of real-time network activities to help analysts to plan for further response steps.

Healey et al. [61] reviewed the scientific and information visualization of the proposed visualization systems for cyber SA. Then, they outlined a set of requirements to develop an appropriate visualization system for cyber SA domain. Huffer and Reed [69] developed a proof-of-concept tool which is able to discover the roles of a system and helps cyber analysts to detect the changes in the network and devise a plan for incident response. Later on, in [52], they expanded their work by proposing an anomaly detection visualization system which can discover and explain suspicious activities and behaviors in the network’s traffic and logs. They leveraged some visualization features such as a temporal histogram, horizon graphs, bar charts, and two-hop communication graph to demonstrate the network situation in real-time.

However, it is important to determine how best to leverage the visualization techniques for SA to allow analysts to monitor and detect unusual changes in the system, plan for incident response, and optimize the security posture.

Vi Discussion

Vi-a Misconceptions

A number of misconceptions around cyber SA realm have been discovered based on the current literature listed as follows:

  • A large volume of data collected about the system and threats is not considered as a part of situation awareness. For instance, intelligence reports such as threat and vulnerabilities. However, to make a monolithic SA system, all stages of gaining SA are required ranging from raw data, intelligence reports to high abstracted data.

  • Intelligence and data sharing is not Cyber SA. Intelligence sharing data is still only one aspect of data gathering phase and can help the SA system in further stages to provide the system with better understanding of perceived or impending situation.

  • However, threat intelligence reports solely should not be considered as cyber SA. Data and information gained through data gathering phase such as vulnerability and threats are only the source of SA analysis and situation awareness phases. Likewise, the outcome of intelligence reports alone needs to be used as an input for further analysis in the other phases and are not cyber SA.

  • The large amount of collected data which is either raw, organized, or processed is not Cyber SA. This collected data can be useful in understanding partial situations. Thus, the collected big data is not SA, and only demonstrates one aspect and organized perspective of the situation and need to go through further analysis and evaluation.

Vi-B Insights and Limitations

According to our extensive survey on the situation awareness systems, design, and development in the curret literature, we discovered that the following aspects of SA are still crude in current studies and will need to be further investigated.

  • More comprehensive data gathering including dealing with large amount of data, pre-processing, and parsing. Data collection need to be real-time which can help SA system to be updated based on the current situation of the system. However, using honeypots to collect and monitor the real data needs further investigation in the literature.

  • Applying anomaly detection techniques in SA is still difficult for dynamic environment. Training and learning capabilities is challenging in this situation as the network traffic is very complex and unpredictable a dynamic environment. Thus, the model is subject to changes over time because anomalies are continuously evolving.

  • There is still a lack of the comprehensive metric for SA which can quantify the system’s current situation with the ability of capturing system’s security/situation changes in real-time using SA-based metrics.

  • There are still missing AI-based approaches for both attacker and defender sides such as modeling game and control theoretic approaches, adversarial modeling for AI-powered threats, artificial intelligence techniques, and human-computer interfaces. The vision of the future system is based on the side-by-side interaction between human analysts and the automated AI-based systems and tools. Moreover, the SA system design and development requires more sophisticated human-computer interaction and improvement on self-learning abilities for defensive and monitoring systems. This interaction will help the transaction from human-based defense system to AI-based systems in SA context more effectively and will enable the defenders to automatically prepare to defend against potential threats with quick adaption and automation capabilities to evolving cyber attacks.

Vii Conclusions

The emerging threats are sophisticated, complex and highly dynamic and need to be addresses using situation awareness monitoring systems equipped with the ability to monitor and defend against wide range of attacks including AI-supported attacks. In this paper, we discussed the cyber SA taxonomy based on a comprehensive framework including different situation awareness levels such as data gathering mapped to situation perception, analysis and techniques mapped to situation comprehension, and finally situation awareness mapped to situation projection. We then conducted an extensive survey on each level of situation awareness category and discuss the current state-of-the-art for each and highlight the limitations. We also introduced the tools and prototypes which can be used for SA systems for either analysis or visualization phases.

Acknowledgement

This work was supported by the Cyber Security Research Programme–”Artificial Intelligence for Automating Response to Threats” from the Ministry of Business, Innovation, and Employment (MBIE) of New Zealand as a part of the Catalyst Strategy Funds under Grant MAUX1912.

References

  • [1] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang (2016) Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. Cited by: TABLE II.
  • [2] H. Abdi and D. Valentin (2007) Multiple correspondence analysis. Encyclopedia of measurement and statistics 2 (4), pp. 651–657. Cited by: §IV-E.
  • [3] A. Adenusi Dauda, E. Ayeleso, A. Kawonise, J. Ekuewa, and A. Adebayo (2017) Development of threats detection model for cyber situation awareness. Technology (ICONSEET) 2 (15), pp. 113–126. Cited by: §IV-B1, §IV-D.
  • [4] H. Alavizadeh, H. Alavizadeh, D. S. Kim, J. Jang-Jaccard, and M. N. Torshiz (2020) An automated security analysis framework and implementation for mtd techniques on cloud. In Information Security and Cryptology – ICISC 2019, J. H. Seo (Ed.), Cham, pp. 150–164. External Links: ISBN 978-3-030-40921-0 Cited by: §III-C.
  • [5] H. Alavizadeh, J. B. Hong, D. S. Kim, and J. Jang-Jaccard (2021) Evaluating the effectiveness of shuffle and redundancy mtd techniques in the cloud. Computers & Security 102, pp. 102091. Cited by: §IV-C.
  • [6] H. Alavizadeh, H. Alavizadeh, and J. Jang-Jaccard (2020) Cyber situation awareness monitoring and proactive response for enterprises on the cloud. In 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 1276–1284. Cited by: §III-C.
  • [7] M. Albanese, N. Cooke, G. Coty, D. Hall, C. Healey, S. Jajodia, P. Liu, M. D. McNeese, P. Ning, D. Reeves, et al. (2017) Computer-aided human centric cyber situation awareness. In Theory and Models for Cyber Situation Awareness, pp. 3–25. Cited by: §I-A, TABLE III, §IV-C.
  • [8] R. Almadhoun, M. Kadadha, M. Alhemeiri, M. Alshehhi, and K. Salah (2018) A user authentication scheme of iot devices using blockchain-enabled fog nodes. In 2018 IEEE/ACS 15th international conference on computer systems and applications (AICCSA), pp. 1–8. Cited by: §IV-F3.
  • [9] E. Alpaydin (2014) Introduction to machine learning. 3rd edition, Adaptive Computation and Machine Learning series, The MIT Press. Cited by: §IV-B1.
  • [10] E. Alpaydin (2020) Introduction to machine learning. 4th edition, Adaptive Computation and Machine Learning series, The MIT Press. Cited by: §IV-B1.
  • [11] M. Angelini, S. Bonomi, S. Lenti, G. Santucci, and S. Taggi (2019) MAD: a visual analytics solution for multi-step cyber attacks detection. Journal of Computer Languages 52, pp. 10–24. Cited by: item –, item –, item –.
  • [12] M. Angelini, N. Prigent, and G. Santucci (2015) PERCIVAL: proactive and reactive attack and response assessment for cyber incidents using visual analytics. In 2015 IEEE Symposium on Visualization for Cyber Security (VizSec), pp. 1–8. Cited by: item –, item –.
  • [13] G. Ateniese, L. V. Mancini, A. Spognardi, A. Villani, D. Vitali, and G. Felici (2015) Hacking smart machines with smarter ones: how to extract meaningful data from machine learning classifiers. International Journal of Security and Networks 10 (3), pp. 137–150. Cited by: §II-B2, TABLE II.
  • [14] T. Back, D. B. Fogel, and Z. Michalewicz (1997) Handbook of evolutionary computation. 1st edition, Computational Intelligence Library, IOP Publishing Ltd.. Cited by: §IV-B2.
  • [15] A. C. Bahnsen, I. Torroledo, L. D. Camacho, and S. Villegas (2018) DeepPhish: simulating malicious ai. In 2018 APWG Symposium on Electronic Crime Research (eCrime), pp. 1–8. Cited by: TABLE II.
  • [16] P. Barford, Y. Chen, A. Goyal, Z. Li, V. Paxson, and V. Yegneswaran (2010) Employing honeynets for network situational awareness. In Cyber Situational Awareness, pp. 71–102. Cited by: §III-A.
  • [17] P. Barford, M. Dacier, T. G. Dietterich, M. Fredrikson, J. Giffin, S. Jajodia, S. Jha, J. Li, P. Liu, P. Ning, et al. (2010) Cyber sa: situational awareness for cyber defense. In Cyber situational awareness, pp. 3–13. Cited by: §I-A, §V-B.
  • [18] K. Berlin, D. Slater, and J. Saxe (2015) Malicious behavior detection using windows audit logs. In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, pp. 35–44. Cited by: §IV-F3.
  • [19] D. M. Best, S. Bohn, D. Love, A. Wynne, and W. A. Pike (2010) Real-time visualization of network behaviors for situational awareness. In Proceedings of the seventh international symposium on visualization for cyber security, pp. 79–90. Cited by: §V-C.
  • [20] C. M. Bishop (2006) Pattern recognition and machine learning (information science and statistics). Springer. Cited by: item –.
  • [21] E. Blasch, D. Shen, K. D. Pham, and G. Chen (2015) Review of game theory applications for situation awareness. In Sensors and Systems for Space Applications VIII, Vol. 9469, pp. 94690I. Cited by: §IV-C.
  • [22] J. M. Bradshaw, M. Carvalho, L. Bunch, T. Eskridge, P. J. Feltovich, M. Johnson, and D. Kidwell (2012) Sol: an agent-based framework for cyber situation awareness. KI-Künstliche Intelligenz 26 (2), pp. 127–140. Cited by: item –, item –, §IV-F2.
  • [23] K. Brancik and G. Ghinita (2011) The optimization of situational awareness for insider threat detection. In Proceedings of the first ACM conference on Data and application security and privacy, pp. 231–236. Cited by: item –, TABLE I.
  • [24] M. Brattstrom and P. Morreale (2017) Scalable agentless cloud network monitoring. In 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), pp. 171–176. Cited by: §IV-F3.
  • [25] J. Brynielsson, U. Franke, and S. Varga (2016) Cyber situational awareness testing. In Combatting Cybercrime and Cyberterrorism, pp. 209–233. Cited by: item –.
  • [26] L. Bunch, J. M. Bradshaw, M. Carvalho, T. Eskridge, P. J. Feltovich, J. Lott, and A. Uszok (2012) Human-agent teamwork in cyber operations: supporting co-evolution of tasks and artifacts with luna. In German Conference on Multiagent System Technologies, pp. 53–67. Cited by: item –.
  • [27] S. Chakradeo, B. Reaves, P. Traynor, and W. Enck (2013) Mast: triage for market-scale mobile malware analysis. In Proceedings of the sixth ACM conference on Security and privacy in wireless and mobile networks, pp. 13–24. Cited by: §IV-E.
  • [28] K. Champaneria and B. S. K. Panchal (2014) Survey of adaptive resonance theory techniques in ids. International Journal of Emerging Technology and Advanced Engineering 4 (12). Cited by: §IV-B1.
  • [29] A. K. Charles, N. Pissinou, A. Busovaca, and K. Makki (2010) Belief-free equilibrium of packet forwarding game in ad hoc networks under imperfect monitoring. In International Performance Computing and Communications Conference, pp. 315–324. Cited by: §IV-C.
  • [30] K. Chawda and A. D. Patel (2014) Dynamic & hybrid honeypot model for scalable network monitoring. In International conference on information communication and embedded systems (ICICES2014), pp. 1–5. Cited by: §III-A.
  • [31] F. Chen, Y. Shen, G. Zhang, and X. Liu (2013) The network security situation predicting technology based on the small-world echo state network. In 2013 IEEE 4th International Conference on Software Engineering and Service Science, pp. 377–380. Cited by: §IV-B1.
  • [32] K. Chung, C. A. Kamhoua, K. A. Kwiat, Z. T. Kalbarczyk, and R. K. Iyer (2016) Game theory with learning for cyber security monitoring. In 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE), pp. 1–8. Cited by: §IV-C, §IV-D.
  • [33] M. H. D. LeBlanc (2002) Writing Secure Code. Vol. 2, Pearson Education. Cited by: §II.
  • [34] DARPA (cited August 2020) Cyber grand challenge (cgc). Note: https://www.darpa.mil/program/cyber-grand-challenge Cited by: §II-B2.
  • [35] DARPA (cited August 2020) Cyber-hunting at scale (chase) - defense advanced research projects agency. Note: https://www.darpa.mil/program/cyber-hunting-at-scale Cited by: §II-B2.
  • [36] D. Dasgupta and F. González (2002) An immunity-based technique to characterize intrusions in computer networks. IEEE Transactions on evolutionary computation 6 (3), pp. 281–291. Cited by: §IV-F1.
  • [37] L. Deng and D. Yu (2014) Deep learning: methods and applications. Foundations and trends in signal processing 7 (3–4), pp. 197–387. Cited by: §IV-F1.
  • [38] T. G. Dietterich, X. Bao, V. Keiser, and J. Shen (2010) Machine learning methods for high level cyber situation awareness. In Cyber Situational Awareness: Issues and Research, S. Jajodia, P. Liu, V. Swarup, and C. Wang (Eds.), pp. 227–247. External Links: ISBN 978-1-4419-0140-8, Document, Link Cited by: §IV-B1.
  • [39] V. Dutt, Y. Ahn, and C. Gonzalez (2013) Cyber situation awareness: modeling detection of cyber attacks with instance-based learning theory. Human Factors 55 (3), pp. 605–618. Cited by: item –, item –, TABLE I.
  • [40] R. C. Eberhart and J. Kennedy (1995) A new optimizer using particle swarm theory. In Proceedings of the 6th International Symposium on Micro Machine and Human Science, pp. 39–43. Cited by: §IV-B2.
  • [41] M. R. Endsley (1988) Design and evaluation for situation awareness enhancement. In Proceedings of the Human Factors Society annual meeting, Vol. 32, pp. 97–101. Cited by: §I-A.
  • [42] M. R. Endsley (1995) Measurement of situation awareness in dynamic systems. Human factors 37 (1), pp. 65–84. Cited by: §I-A.
  • [43] J. Ezick, T. Henretty, M. Baskaran, R. Lethin, J. Feo, T. Tuan, C. Coley, L. Leonard, R. Agrawal, B. Parsons, et al. (2019)

    Combining tensor decompositions and graph analytics to provide cyber situational awareness at hpc scale

    .
    In 2019 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7. Cited by: §IV-F2.
  • [44] C. Fachkha and M. Debbabi (2015) Darknet as a source of cyber intelligence: survey, taxonomy, and characterization. IEEE Communications Surveys & Tutorials 18 (2), pp. 1197–1227. Cited by: §III-A.
  • [45] G. Fink, D. Best, D. Manz, V. Popovsky, and B. Endicott-Popovsky (2013) Gamification for measuring cyber security situational awareness. In International Conference on Augmented Cognition, pp. 656–665. Cited by: item –, TABLE III.
  • [46] U. Fiore, F. Palmieri, A. Castiglione, and A. De Santis (2013)

    Network anomaly detection with the restricted boltzmann machine

    .
    Neurocomputing 122, pp. 13–23. Cited by: §IV-F1.
  • [47] F. Fischer and D. A. Keim (2014) NStreamAware: real-time visual analytics for data streams to enhance situational awareness. In Proceedings of the Eleventh Workshop on Visualization for Cyber Security, pp. 65–72. Cited by: §V-C.
  • [48] W. M. Fitzgerald, U. Neville, and S. N. Foley (2013) MASON: mobile autonomic security for network access controls. Journal of Information Security and Applications 18 (1), pp. 14–29. Cited by: item –.
  • [49] U. Franke and J. Brynielsson (2014) Cyber situational awareness–a systematic review of the literature. Computers & security 46, pp. 18–31. Cited by: §I-D.
  • [50] O. B. Fredj (2015) A realistic graph-based alert correlation system. Security and Communication Networks 8 (15), pp. 2477–2493. Cited by: §V-A2.
  • [51] I. Friedberg, F. Skopik, and R. Fiedler (2015) Cyber situational awareness through network anomaly detection: state of the art and new approaches. e & i Elektrotechnik und Informationstechnik 132 (2), pp. 101–105. Cited by: §IV-F.
  • [52] J. R. Goodall, E. D. Ragan, C. A. Steed, J. W. Reed, G. D. Richardson, K. M. Huffer, R. A. Bridges, and J. A. Laska (2018) Situ: identifying and explaining suspicious behavior in networks. IEEE transactions on visualization and computer graphics 25 (1), pp. 204–214. Cited by: TABLE I, §V-C, §V-C.
  • [53] A. Gouglidis, B. Green, J. Busby, M. Rouncefield, D. Hutchison, and S. Schauer (2016) Threat awareness for critical infrastructures resilience. In 2016 8th International Workshop on Resilient Networks Design and Modeling (RNDM), pp. 196–202. Cited by: TABLE I.
  • [54] K. Guang, T. Guangming, D. Xia, W. Shuo, and W. Kun (2016) A network security situation assessment method based on attack intention perception. In 2016 2nd IEEE International Conference on Computer and Communications (ICCC), pp. 1138–1142. Cited by: §V-A2.
  • [55] W. Guo, X. Tang, J. Cheng, J. Xu, C. Cai, and Y. Guo (2019) DDoS attack situation information fusion method based on dempster-shafer evidence theory. In International Conference on Artificial Intelligence and Security, pp. 396–407. Cited by: TABLE I.
  • [56] Y. Guo, Z. Zhang, Y. Guo, and X. Guo (2020) Nudging personalized password policies by understanding users’ personality. Computers & Security 94, pp. 101801. External Links: ISSN 0167-4048, Document, Link Cited by: TABLE II.
  • [57] Y. Guo and A. Tyagi (2017) Voice-based user-device physical unclonable functions for mobile device authentication. Journal of Hardware and Systems Security 1 (1), pp. 18–37. Cited by: §IV-F3.
  • [58] L. Harrison, J. Laska, R. Spahn, M. Iannacone, E. Downing, E. M. Ferragut, and J. R. Goodall (2012) Situ: situational understanding and discovery for cyber attacks. In 2012 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 307–308. Cited by: §IV-F.
  • [59] C. He and Y. Li (2017) Survey of network security situation awareness. In 2017 International Conference on Computational Science and Engineering (ICCSE 2017), Cited by: §IV.
  • [60] F. He, Y. Zhang, H. Liu, and W. Zhou (2018) SCPN-based game model for security situational awareness in the intenet of things. In 2018 IEEE Conference on Communications and Network Security (CNS), pp. 1–5. Cited by: item –, item –, item –, item –, TABLE I.
  • [61] C. G. Healey, L. Hao, and S. E. Hutchinson (2014) Visualizations and analysts. In Cyber Defense and Situational Awareness, pp. 145–165. Cited by: §V-C, §V-C.
  • [62] B. Hitaj, G. Ateniese, and F. Perez-Cruz (2017) Deep models under the gan: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603–618. Cited by: §II-B2, TABLE II.
  • [63] B. Hitaj, P. Gasti, G. Ateniese, and F. Perez-Cruz (2019) Passgan: a deep learning approach for password guessing. In International Conference on Applied Cryptography and Network Security, pp. 217–237. Cited by: §II-B1, TABLE II.
  • [64] K. Hornik, M. Stinchcombe, and H. White (1989) Multilayer feedforward networks are universal approximators. Neural Networks 2 (5), pp. 359–366. Cited by: TABLE II.
  • [65] H. Hu, H. Zhang, Y. Liu, and Y. Wang (2017) Quantitative method for network security situation based on attack prediction. Security and Communication Networks 2017. Cited by: §V-A2.
  • [66] J. Hu, D. Ma, L. Chen, H. Yan, and C. Hu (2019) An improved prediction model for the network security situation. In International Conference on Smart Computing and Communication, pp. 22–33. Cited by: §IV-B1.
  • [67] J. Hu, D. Ma, C. Liu, Z. Shi, H. Yan, and C. Hu (2019) Network security situation prediction based on mr-svm. IEEE Access 7, pp. 130937–130945. Cited by: §IV-B1.
  • [68] W. Hu and Y. Tan (2017) Generating adversarial malware examples for black-box attacks based on gan. arXiv preprint arXiv:1702.05983. Cited by: §II-B2, TABLE II.
  • [69] K. M. Huffer and J. W. Reed (2017) Situational awareness of network system roles (sansr). In Proceedings of the 12th Annual Conference on Cyber and Information Security Research, pp. 1–4. Cited by: §V-C.
  • [70] M. Husák, J. Komárková, E. Bou-Harb, and P. Čeleda (2018) Survey of attack projection, prediction, and forecasting in cyber security. IEEE Communications Surveys & Tutorials 21 (1), pp. 640–660. Cited by: §I-D.
  • [71] F. Iglesias and T. Zseby (2015) Analysis of network traffic features for anomaly detection. Machine Learning 101 (1-3), pp. 59–84. Cited by: §IV-F1.
  • [72] G. Ioannou, P. Louvieris, and N. Clewley (2019) A markov multi-phase transferable belief model for cyber situational awareness. IEEE Access 7, pp. 39305–39320. Cited by: item –, TABLE I.
  • [73] A. K. Jain and B. B. Gupta (2016) A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP Journal on Information Security 2016 (1), pp. 9. Cited by: §IV-F2.
  • [74] J. Jang, D. Brumley, and S. Venkataraman (2011) Bitshred: feature hashing malware for scalable triage and semantic analysis. In Proceedings of the 18th ACM conference on Computer and communications security, pp. 309–320. Cited by: §IV-E.
  • [75] M. Janiszewski, A. Felkner, and P. Lewandowski (2019) A novel approach to national-level cyber risk assessment based on vulnerability management and threat intelligence. Journal of Telecommunications and Information Technology. Cited by: item –, item –.
  • [76] A. Javaid, Q. Niyaz, W. Sun, and M. Alam (2016) A deep learning approach for network intrusion detection system. In Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS), pp. 21–26. Cited by: §IV-F1.
  • [77] M. Jiang (2020) Improving situational awareness with collective artificial intelligence over knowledge graphs. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications II, Vol. 11413, pp. 114130J. Cited by: §II-B.
  • [78] T. Joachims (1999) Transductive inference for text classification using support vector machines. In Proceedings of the 16th International Conference on Machine Learning, pp. 200–209. Cited by: item –.
  • [79] K. S. Jones (1972) A statistical interpretation of term specificity and its application in retrieval. Journal of documentation. Cited by: §IV-E.
  • [80] K. A. D. Jong (2006) Evolutionary computation: a unified approach. MIT Press. Cited by: §IV-B2.
  • [81] N. Kaloudi and J. Li (2020) The ai-based cyber threat landscape: a survey. ACM Computing Surveys (CSUR) 53 (1), pp. 1–34. Cited by: §II-B.
  • [82] T. Kanstrén and A. Evesti (2016) A study on the state of practice in security situational awareness. In 2016 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), pp. 69–76. Cited by: TABLE III.
  • [83] H. G. Kayacik, A. N. Zincir-Heywood, and M. I. Heywood (2005) Selecting features for intrusion detection: a feature relevance analysis on kdd 99 intrusion detection datasets. In Proceedings of the third annual conference on privacy, security and trust, Vol. 94, pp. 1723–1722. Cited by: item –.
  • [84] M. A. Khan and K. Salah (2018) IoT security: review, blockchain solutions, and open challenges. Future Generation Computer Systems 82, pp. 395–411. Cited by: §III-A.
  • [85] D. Kienzle, N. Evans, and M. Elder (2013) NICE: network introspection by collaborating endpoints. In 2013 IEEE Conference on Communications and Network Security (CNS), pp. 411–412. Cited by: §IV-F3.
  • [86] J. Kim, J. Kim, H. L. T. Thu, and H. Kim (2016) Long short term memory recurrent neural network classifier for intrusion detection. In 2016 International Conference on Platform Technology and Service (PlatCon), pp. 1–5. Cited by: §I-A, §IV-F1.
  • [87] D. Kirat, J. Jang, and M. Stoecklin (2018) Deeplocker–concealing targeted attacks with ai locksmithing. Blackhat USA. Cited by: §II-B1, TABLE II.
  • [88] D. Kirat, L. Nataraj, G. Vigna, and B. Manjunath (2013) Sigmal: a static signal processing based malware triage. In Proceedings of the 29th Annual Computer Security Applications Conference, pp. 89–98. Cited by: §IV-E.
  • [89] I. Koniaris, G. Papadimitriou, P. Nicopolitidis, and M. Obaidat (2014) Honeypots deployment for the analysis and visualization of malware activity and malicious connections. In 2014 IEEE international conference on communications (ICC), pp. 1819–1824. Cited by: §III-A.
  • [90] I. Kotenko, E. Doynikova, A. Chechulin, and A. Fedorchenko (2018) AI-and metrics-based vulnerability-centric cyber security assessment and countermeasure selection. In Guide to Vulnerability Analysis for Computer Networks and Systems, pp. 101–130. Cited by: §III-C, §III-F.
  • [91] G. Kou, S. Wang, and G. Tang (2019) Research on key technologies of network security situational awareness for attack tracking prediction. Chinese Journal of Electronics 28 (1), pp. 162–171. Cited by: item –, item –, TABLE I, §V-A2.
  • [92] W. Kun, Q. Hui, Y. Haopu, and H. Di (2015) Network security situation evaluation method based on attack intention recognition. In 2015 4th International Conference on Computer Science and Network Technology (ICCSNT), Vol. 1, pp. 919–924. Cited by: §V-A2.
  • [93] A. Lakhotia, A. Walenstein, C. Miles, and A. Singh (2013) Vilo: a rapid learning nearest-neighbor classifier for malware triage. Journal of Computer Virology and Hacking Techniques 9 (3), pp. 109–123. Cited by: §IV-E.
  • [94] K. Lakkaraju, W. Yurcik, and A. J. Lee (2004) NVisionIP: netflow visualizations of system state for security situational awareness. In Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security, pp. 65–72. Cited by: §I-A, §IV-G3.
  • [95] H. Larochelle and Y. Bengio (2008) Classification using discriminative restricted boltzmann machines. In Proceedings of the 25th international conference on Machine learning, pp. 536–543. Cited by: §IV-F1.
  • [96] Y. Leau and S. Manickam (2015) Network security situation prediction: a review and discussion. In International Conference on Soft Computing, Intelligence Systems, and Information Technology, pp. 424–435. Cited by: §I-D, item –, §IV-B1.
  • [97] D. Li and Z. Liu (2013) Situation element extraction of network security based on logistic regression and improved particle swarm optimization. In 2013 Ninth International Conference on Natural Computation (ICNC), pp. 569–573. Cited by: §IV-B2.
  • [98] Y. Li, G. Huang, C. Wang, and Y. Li (2019) Analysis framework of network security situational awareness and comparison of implementation methods. EURASIP Journal on Wireless Communications and Networking 2019 (1), pp. 205. Cited by: §IV-F, §IV-F.
  • [99] Y. Li (2020) Research on network security situation awareness strategy based on markov game model. In The International Conference on Cyber Security Intelligence and Analytics, pp. 603–608. Cited by: §IV-C.
  • [100] Z. Li, A. Goyal, and Y. Chen (2008) Honeynet-based botnet scan traffic analysis. In Botnet Detection, pp. 25–44. Cited by: §III-A.
  • [101] Y. Liang, H. Wang, and J. Lai (2007) Quantification of network security situational awareness based on evolutionary neural network. In 2007 International Conference on Machine Learning and Cybernetics, Vol. 6, pp. 3267–3272. Cited by: §IV-B2.
  • [102] Z. Lin, G. Chen, W. Guo, and Y. Liu (2008) PSO-bpnn-based prediction of network security situation. In 2008 3rd International Conference on Innovative Computing Information and Control, pp. 37–37. Cited by: §IV-B2.
  • [103] P. Liu, X. Jia, S. Zhang, X. Xiong, Y. Jhi, K. Bai, and J. Li (2010) Cross-layer damage assessment for cyber situational awareness. In Cyber Situational Awareness, pp. 155–176. Cited by: §I-A, §V-A1.
  • [104] S. Liu and Y. Liu (2016) Network security risk assessment method based on hmm and attack graph model. In 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 517–522. Cited by: §V-A2.
  • [105] G. Lu and D. Feng (2018) Network security situation awareness for industrial control system under integrity attacks. In 2018 21st International Conference on Information Fusion (FUSION), pp. 1808–1815. Cited by: item –, TABLE I.
  • [106] F. Mansmann, M. Krstajic, F. Fischer, and E. Bertini (2012) StreamSqueeze: a dynamic stream visualization for monitoring of event data. In Visualization and Data Analysis 2012, Vol. 8294, pp. 829404. Cited by: §I-A, §V-C.
  • [107] S. McElwee, J. Heaton, J. Fraley, and J. Cannady (2017) Deep learning for prioritizing and responding to intrusion detection alerts. In MILCOM 2017-2017 IEEE Military Communications Conference (MILCOM), pp. 1–5. Cited by: §IV-F1.
  • [108] S. McElwee (2017) Active learning intrusion detection using k-means clustering selection. In SoutheastCon 2017, pp. 1–7. Cited by: §IV-F1, §IV-F.
  • [109] S. McKenna, D. Staheli, and M. Meyer (2015) Unlocking user-centered design methods for building cyber security visualizations. In 2015 IEEE Symposium on Visualization for Cyber Security (VizSec), pp. 1–8. Cited by: Fig. 5.
  • [110] J. Meng, C. Ma, J. He, and H. Zhang (2011) Network security situation prediction model based on hhga-rbf neural network. Computer Science 38 (7), pp. 70. Cited by: §IV-B1, §IV-B2.
  • [111] S. Mukkamala, G. Janoski, and A. Sung (2002) Intrusion detection using neural networks and support vector machines. In Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No. 02CH37290), Vol. 2, pp. 1702–1707. Cited by: §IV-F1.
  • [112] H. Nakakoji, Y. Fujii, Y. Isobe, T. Shigemoto, T. Kito, N. Hayashi, N. Kawaguchi, N. Shimotsuma, and H. Kikuchi (2016) Proposal and evaluation of cyber defense system using blacklist refined based on authentication results. In 2016 19th International Conference on Network-Based Information Systems (NBiS), pp. 135–139. Cited by: §IV-F2.
  • [113] A. K. Nandi, H. R. Medal, and S. Vadlamani (2016) Interdicting attack graphs to protect organizations from cyber attacks: a bi-level defender–attacker model. Computers & Operations Research 75, pp. 118–131. Cited by: §V-A2.
  • [114] L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath (2011) Malware images: visualization and automatic classification. In Proceedings of the 8th international symposium on visualization for cyber security, pp. 1–7. Cited by: §IV-E.
  • [115] K. Neshatian, M. Zhang, and P. Andreae (2012)

    A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming

    .
    IEEE Transections Evolutionary Computation 16 (5), pp. 645–661. Cited by: §IV-F1.
  • [116] H. Okhravi, J. Haines, and K. Ingols (2011) Achieving cyber survivability in a contested environment using a cyber moving target. High Frontier Journal 7 (3), pp. 9–13. Cited by: item –.
  • [117] B. Pal, T. Daniel, R. Chatterjee, and T. Ristenpart (2019) Beyond credential stuffing: password similarity models using neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pp. 417–434. Cited by: TABLE II.
  • [118] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami (2016) The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P), pp. 372–387. Cited by: §II-B2, TABLE II.
  • [119] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. Cited by: TABLE II.
  • [120] D. Pasquini, G. Ateniese, and M. Bernsaschi (2020) Interpretable probabilistic password strength meters via deep learning. arXiv preprint arXiv:2004.07179. Cited by: TABLE II.
  • [121] M. Pendleton, R. Garcia-Lebron, J. Cho, and S. Xu (2016) A survey on systems security metrics. ACM Computing Surveys (CSUR) 49 (4), pp. 1–35. Cited by: Fig. 4, §V-B.
  • [122] I. Prieto, E. Magaña, D. Morató, and M. Izal (2011) Botnet detection based on dns records and active probing. In Proceedings of the International Conference on Security and Cryptography, pp. 307–316. Cited by: item –.
  • [123] N. Provos et al. (2004) A virtual honeypot framework.. In USENIX Security Symposium, Vol. 173, pp. 1–14. Cited by: §III-A.
  • [124] Z. Qu, Y. Li, et al. (2010) A network security situation evaluation method based on ds evidence theory. In 2010 The 2nd Conference on Environmental Science and Information Application Technology, Vol. 2, pp. 496–499. Cited by: §V-A2.
  • [125] P. Rajivan and N. Cooke (2017) Impact of team collaboration on cybersecurity situational awareness. In Theory and Models for Cyber Situation Awareness, pp. 203–226. Cited by: TABLE I, §V-A1.
  • [126] Rhodes (2019) Artificial intelligence and offensive cyber weapons. Strategic Comments 25 (10), pp. x–xii. Cited by: §II-B1.
  • [127] M. A. Salama, H. F. Eid, R. A. Ramadan, A. Darwish, and A. E. Hassanien (2011) Hybrid intelligent intrusion detection scheme. In Soft computing in industrial applications, pp. 293–303. Cited by: §IV-F1, §IV-F1.
  • [128] T. Segaran (2007) Programming collective intelligence. 1st edition, O’Reilly. Cited by: §IV-B1.
  • [129] J. Seymour and P. Tully (2016)

    Weaponizing data science for social engineering: automated e2e spear phishing on twitter

    .
    Black Hat USA 37, pp. 1–39. Cited by: §II-B1, TABLE II.
  • [130] D. Shen, G. Chen, J. B. Cruz Jr, L. Haynes, M. Kruger, and E. Blasch (2007) A markov game theoretic data fusion approach for cyber situational awareness. In Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2007, Vol. 6571, pp. 65710F. Cited by: §IV-A, §IV-C.
  • [131] Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, and S. Vishwanathan (2009) Hash kernels for structured data. The Journal of Machine Learning Research 10, pp. 2615–2637. Cited by: §IV-E.
  • [132] R. Shokri and V. Shmatikov (2015) Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp. 1310–1321. Cited by: TABLE II.
  • [133] B. Simon (2019) How artificial intelligence will shape the future of malware. https://www.makeuseof.com/tag/artificial-intelligence-future-malware/. Cited by: §II-B1.
  • [134] F. Skopik, G. Settanni, and R. Fiedler (2016) A problem shared is a problem halved: a survey on the dimensions of collective cyber defense through security information sharing. Computers & Security 60, pp. 154–176. Cited by: §III-B.
  • [135] M. P. Stoecklin (2018) Deeplocker: how ai can power a stealthy new breed of malware. Security Intelligence, August 8. Cited by: §II-B1, TABLE II.
  • [136] P. Sun, J. Li, M. Z. Alam Bhuiyan, L. Wang, and B. Li (2019-04) Modeling and clustering attacker activities in IoT through machine learning techniques. Information Sciences 479, pp. 456–471. External Links: Document, ISSN 00200255 Cited by: §III-A.
  • [137] X. Sun, P. Zhang, J. K. Liu, J. Yu, and W. Xie (2018) Private machine learning classification based on fully homomorphic encryption. IEEE Transactions on Emerging Topics in Computing. Cited by: TABLE II.
  • [138] X. Sun, J. Dai, A. Singhal, and P. Liu (2017) Enterprise-level cyber situation awareness. In Theory and models for cyber situation awareness, pp. 66–109. Cited by: item –, §III-C, §III-C, §III-D, TABLE III.
  • [139] C. Tang, Y. Xie, B. Qiang, X. Wang, and R. Zhang (2011) Security situation prediction based on dynamic bp neural with covariance. Procedia Engineering 15, pp. 3313–3317. Cited by: §IV-B1.
  • [140] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho (2016) Deep learning approach for network intrusion detection in software defined networking. In 2016 International Conference on Wireless Networks and Mobile Communications (WINCOM), pp. 258–263. Cited by: §IV-F1.
  • [141] X. Tao, D. Kong, Y. Wei, and Y. Wang (2016) A big network traffic data fusion approach based on fisher and deep auto-encoder. Information 7 (2), pp. 20. Cited by: §IV-A.
  • [142] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani (2009) A detailed analysis of the kdd cup 99 data set. In 2009 IEEE symposium on computational intelligence for security and defense applications, pp. 1–6. Cited by: item –.
  • [143] O. Thonnard and M. Dacier (2008-09) A framework for attack patterns’ discovery in honeynet data. Digital Investigation 5 (SUPPL.), pp. S128–S139. External Links: Document, ISSN 17422876 Cited by: §III-A.
  • [144] H. Tianfield (2016) Cyber security situational awareness. In 2016 IEEE international conference on internet of things (iThings) and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) and IEEE smart data (SmartData), pp. 782–787. Cited by: §IV-A.
  • [145] F. Tramer, N. Carlini, W. Brendel, and A. Madry (2020) On adaptive attacks to adversarial example defenses. arXiv preprint arXiv:2002.08347. Cited by: §II-B.
  • [146] F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart (2016) Stealing machine learning models via prediction apis. In 25th USENIX Security Symposium (USENIX Security 16), pp. 601–618. Cited by: §II-B2, TABLE II.
  • [147] K. Trieu and Y. Yang (2018) Artificial intelligence-based password brute force attacks. In In Proceedings of the 13th Annual Conference of the Midwest AIS (MWAIS’18), Cited by: item –, item –, TABLE I, TABLE II.
  • [148] R. Vaarandi, B. Blumbergs, and M. Kont (2018) An unsupervised framework for detecting anomalous messages from syslog log files. In NOMS 2018-2018 IEEE/IFIP Network Operations and Management Symposium, pp. 1–6. Cited by: item –.
  • [149] R. Vaarandi (2005) Tools and techniques for event log analysis. Tallinn University of Technology Press. Cited by: item –.
  • [150] A. Valdes and K. Skinner (2000) Adaptive, model-based monitoring for cyber attack detection. In International Workshop on Recent Advances in Intrusion Detection, pp. 80–93. Cited by: §IV-F1.
  • [151] G. Vandenberghe (2008) Network traffic exploration application: a tool to assess, visualize, and analyze network security events. In International Workshop on Visualization for Computer Security, pp. 181–196. Cited by: item –.
  • [152] R. Vinayakumar, P. Poornachandran, and K. Soman (2018) Scalable framework for cyber threat situational awareness based on domain name systems data analysis. In Big data in engineering applications, pp. 113–142. Cited by: item –, TABLE I, §III-D, §IV-B1.
  • [153] R. Vinayakumar, K. Soman, P. Poornachandran, S. Akarsh, and M. Elhoseny (2019) Deep learning framework for cyber threat situational awareness based on email and url data analysis. In Cybersecurity and Secure Information Systems, pp. 87–124. Cited by: TABLE I, §IV-B1, §IV-F2.
  • [154] H. Wang, Y. Liang, and X. Liu (2008) Stochastic game theoretic method of quantification for network situational awareness. In 2008 International Conference on Internet Computing in Science and Engineering, pp. 312–316. Cited by: §IV-C.
  • [155] C. J. Watkins and P. Dayan (1992) Q-learning. Machine learning 8 (3-4), pp. 279–292. Cited by: §IV-D, §IV-D.
  • [156] J. Webb, A. Ahmad, S. B. Maynard, and G. Shanks (2014) A situation awareness model for information security risk management. Computers & security 44, pp. 1–15. Cited by: §V-B.
  • [157] M. Weir, S. Aggarwal, B. De Medeiros, and B. Glodek (2009) Password cracking using probabilistic context-free grammars. In 2009 30th IEEE Symposium on Security and Privacy, pp. 391–405. Cited by: TABLE II.
  • [158] K. Weiss and T. Khoshgoftaar (2016)

    A survey of transfer learning

    .
    Journal of Big Data 3 (1), pp. 1–40. Cited by: §IV-B1.
  • [159] J. White, J. S. Park, C. A. Kamhoua, and K. A. Kwiat (2013) Game theoretic attack analysis in online social network (osn) services. In Proceedings of the 2013 ieee/acm international conference on advances in social networks analysis and mining, pp. 1012–1019. Cited by: §IV-C.
  • [160] L. Williams, R. Lippmann, and K. Ingols (2008) An interactive attack graph cascade and reachability display. In VizSEC 2007, pp. 221–236. Cited by: item –.
  • [161] L. Williams, R. Lippmann, and K. Ingols (2008) GARNET: a graphical attack graph and reachability network evaluation tool. In International Workshop on Visualization for Computer Security, pp. 44–59. Cited by: item –.
  • [162] J. Wu, K. Ota, M. Dong, J. Li, and H. Wang (2016) Big data analysis-based security situational awareness for smart grid. IEEE Transactions on Big Data 4 (3), pp. 408–417. Cited by: item –, TABLE I, item –, §III-D, TABLE III, §IV-D.
  • [163] R. Xi, S. Jin, X. Yun, and Y. Zhang (2011) CNSSA: a comprehensive network security situation awareness system. In 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, pp. 482–487. Cited by: Fig. 3, item –, §IV-G3.
  • [164] Z. Xia, P. Yi, Y. Liu, B. Jiang, W. Wang, and T. Zhu (2019) GENPass: a multi-source deep learning model for password guessing. IEEE Transactions on Multimedia 22 (5), pp. 1323–1332. Cited by: TABLE II.
  • [165] W. Xing-zhu (2016) Network information security situation assessment based on bayesian network. International Journal of Security and its Applications 10 (5), pp. 129–138. Cited by: §IV-D.
  • [166] B. Xue, M. Zhang, W. Browne, and X. Yao (2016) A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation 20 (4), pp. 606–626. Cited by: §IV-F1.
  • [167] S. J. Yang, H. Du, J. Holsopple, and M. Sudit (2014) Attack projection. In Cyber Defense and Situational Awareness, pp. 239–261. Cited by: item –, TABLE I.
  • [168] Y. Yao, B. Viswanath, J. Cryan, H. Zheng, and B. Y. Zhao (2017) Automated crowdturfing attacks and defenses in online review systems. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1143–1158. Cited by: TABLE II.
  • [169] J. Yen, M. McNeese, T. Mullen, D. Hall, X. Fan, and P. Liu (2010) RPD-based hypothesis reasoning for cyber situation awareness. In Cyber situational awareness, pp. 39–49. Cited by: §III-B.
  • [170] X. Yin, W. Yurcik, and A. Slagell (2005) The design of visflowconnect-ip: a link analysis system for ip security situational awareness. In Third IEEE International Workshop on Information Assurance (IWIA’05), pp. 141–153. Cited by: §IV-G3.
  • [171] L. Ying, L. Bingyang, and W. Huiqiang (2010) Dynamic awareness of network security situation based on stochastic game theory. In The 2nd International Conference on Software Engineering and Data Mining, pp. 101–105. Cited by: §IV-C, §IV-C.
  • [172] W. Yu, S. Wei, D. Shen, M. Blowers, E. P. Blasch, K. D. Pham, G. Chen, H. Zhang, and C. Lu (2013) On detection and visualization techniques for cyber security situation awareness. In Sensors and Systems for Space Applications VI, Vol. 8739, pp. 87390R. Cited by: Fig. 5, §V-C.
  • [173] Y. Yuan and F. Sun (2015) Data fusion-based resilient control system under dos attacks: a game theoretic approach. International Journal of Control, Automation and Systems 13 (3), pp. 513–520. Cited by: §IV-A.
  • [174] H. Zhang, Y. Yi, J. Wang, N. Cao, and Q. Duan (2018) Network security situation awareness framework based on threat intelligence. Computers, Materials and Continua 56 (3), pp. 381–399. Cited by: item –, item –, TABLE I, §III-B, TABLE III, §IV-C, §IV-C.
  • [175] Y. Zhang, S. Jin, X. Cui, X. Yin, and Y. Pang (2012) Network security situation prediction based on bp and rbf neural network. In International Conference on Trustworthy Computing and Services, pp. 659–665. Cited by: §IV-B1, §IV-B1.
  • [176] Y. Zhang and S. Jin (2014) Predicting network security situation based on a combination model of multiple neural networks.. Int. J. Software and Informatics 8 (2), pp. 167–176. Cited by: §IV-B1, §IV-B1.
  • [177] Y. Zhang, X. Tan, X. Cui, and H. Xi (2011) Network security situation awareness approach based on markov game model. Journal of software 22 (3), pp. 495–508. Cited by: §IV-C, §IV-C.
  • [178] D. Zhao and J. Liu (2018) Study on network security situation awareness based on particle swarm optimization algorithm. Computers & Industrial Engineering 125, pp. 764–775. Cited by: §IV-B2, §IV-B2, §IV-B2.
  • [179] R. Zheng, D. Zhang, Q. Wu, M. Zhang, and C. Yang (2012) A strategy of network security situation autonomic awareness. In International Conference on Network Computing and Information Security, pp. 632–639. Cited by: §IV-B1.
  • [180] Y. Zheng, Y. Cao, and C. Chang (2019) UDhashing: physical unclonable function-based user-device hash for endpoint authentication. IEEE Transactions on Industrial Electronics 66 (12), pp. 9559–9570. Cited by: §IV-F3.
  • [181] C. Zhong, J. Yen, P. Liu, R. F. Erbacher, C. Garneau, and B. Chen (2017) Studying analysts’ data triage operations in cyber defense situational analysis. In Theory and Models for Cyber Situation Awareness, pp. 128–169. Cited by: §IV-A.