With the widespread use of technology, cyber-security has become a concern for both commercial organizations and governments. With the recent incidents of data breaches at Equifax, Verizon, Gmail and others111https://www.identityforce.com/blog/2017-data-breaches, organizations are looking at methods to proactively identify if they will be target of future attacks. A 2017 Verizon investigation report stated that 75% of breaches were perpetrated by outsiders exploiting known vulnerabilities . Monitoring the vulnerabilities that are of interest to malicious threat actors from the discussions on Darkweb/Deepweb (D2web) hacking sites is a key aspect of predicting cyber-attacks .
In this paper, we describe DARKMENTION, a system that identifies indicators of risks from unconventional sources of threat intelligence (D2web), monitors those sources in real-time to reason about the likelihood of future threats, generates warnings, and submits them to the Security Operations Center.
and logic programming (in particular, the concepts of Point Frequent Function (pfr) from APT-logic [5, 6, 7]) to learn association rules. An example of the rules we sought to learn is “if certain D2web activity is observed in a given time-point, then there will be an number of attacks of type , targeting organization in exactly
time-points, with probability”. Our data is obtained from a commercially available API, maintained by a cyber-threat intelligence firm (called CYR3CON222https://cyr3con.ai), and from over 500 historical records of real-world targeted cyber incidents. Those incidents are recorded from the logs of two large enterprises participating to the IARPA Cyber-attack Automated Unconventional Sensor Environment (CAUSE) program333https://www.iarpa.gov/index.php/research-programs/cause.
Throughout the paper, we illustrate the viability of DARKMENTION as a tool that addresses problems directly related to situational awareness, resource allocation, and countermeasure prioritization. In particular, we show that DARKMENTION produces warnings that are,
timely: indicates the exact time-point in which a predicted attack will occur,
actionable: provides metadata/warning details, i.e., the target enterprise, type of attack, volume, and the software vulnerabilities/threat actor identified from the D2web discussions,
accurate: predicted unseen real-world attacks with an average increase in F1 of over 45% for one enterprise and 57% for the other, and
transparent: allows analysts to easily trace the warnings back to the rules that were triggered, discussions that fired the rules, etc.
Ii Dataset Description
Ii-a D2web Crawling Infrastructure
Darkweb refers to the portion of the internet that is not indexed by search engines and cannot be accessed by standard browsers. Specialized browsers like Tor444See the Tor Project’s official website (https://www.torproject.org/). are required to access darkweb sites. We retrieve information from both marketplaces: where users advertise to sell information regarding vulnerabilities or exploits, and forums: that provide discussions on discovered vulnerabilities among others.
We summarize the D2web crawling infrastructure that CYR3CON maintains—originally introduced in 
. Customized lightweight crawlers and parsers were built for each site to collect and extract data. At the time of writing this paper, data is collected from more than 400 platforms (forums and marketplaces). To ensure collection of cybersecurity relevant data, machine learning models are used to filter data related to drugs, weapons and other discussions irrelevant to cybersecurity.
Ii-B Data Pre-processing
CVE-CPE mapping: Common Vulnerability Enumeration (CVE) is a unique identifier assigned to each software vulnerability reported in the National Vulnerability Database (NVD ). Common Platform Enumeration (CPE) is a list of software/hardware products that are vulnerable to a given CVE. CPE data can be obtained from the NVD. We query the database using API calls to look for postings with software vulnerability mentions (in terms of CVE number). Regular expressions555https://cve.mitre.org/cve/identifiers/syntaxchange.html are used to identify CVE mentions. We map each CVE to pre-identified groups of CPEs666We cluster CPEs together based on common vendors/products. We identified over 100 groups of CPEs, e.g., Microsoft-Office, Apache-Tomcat, and Intel. However, only 33 were used as preconditions in the rules. and nation-state threat actors who are known to leverage that CVE as part of their attack tactics777We have encoded a list of threat actors along with vulnerabilities they favor by manually analyzing cyberthreat reports that were recently published by cybersecurity companies, e.g., https://media.kaspersky.com/en/business-security/enterprise/KL_Report_Exploits_in_2016_final.pdf.. Perhaps among the well-known threat actors is the North Korean group HIDDEN COBRA, which was recently identified to account for an increasing number of cyberattacks to US targets888https://www.us-cert.gov/ncas/alerts/TA17-164A. These CPE and nation-state actor mappings are used as pre-conditions during the rule-learning phase that is discussed in Sectiona III and IV.
Ii-C Enterprise-Relevant External Threats
To construct rules and evaluate the performance of the learned model, we use data from historical records of attack attempts that are recorded from the logs of two enterprises participating in the IARPA CAUSE program999https://www.iarpa.gov/index.php/research-programs/cause.. One of the two enterprises is a defense industrial base (referred to Armstrong) while the other is a financial services organization (referred to Dexter). The database is distributed to the performers participating in a contest led by IARPA in increments, once every few months. Each data point is a record of a detected deliberate malicious attempt to gain unauthorized access, alter or destroy data, or interrupt services or resources in the environment of the participating organizations. Those malicious attempts were detected in uncontrolled environment, and by different security defense commercial products such as anti-virus, intrusion detection systems, and hardware controls. Each ground truth (GT) record includes: ID, Format Version, Reported Time, Occurrence Time, Event Type, and Target Industry101010We intentionally skip some details about other fields of the GT records due to the limitation in space and irrelevance to the scope of this paper.. The types of attacks included in the GT dataset are:
Malicious Email (M-E). A malicious attempt is identified as a Malicious Email event if an email is received by the organization, and it either contains a malicious email attachment, or a link (embedded URL or IP address) to a known malicious destination.
Malicious Destination (M-D). A malicious attempt is identified as a visit to a Malicious Destination if the visited URL or IP address hosts malicious content.
Endpoint Malware (E-M). A Malware on Endpoint event is identified if malware is discovered on an endpoint device. This includes, but not limited to, ransomware, spyware, and adware.
Other events related to insider threats are out of the scope of the current phase of the deployed system. A summary of the time periods and the number of records for each attack type is provided in Section V-C.
Iii Deployed System
In this section we provide an overview of DARKMENTION. Figure 1 provides a graphical illustration of the system’s workflow. Our system comprises three main components, each was designed to serve a set of tasks. There are three reasons for having this particular design: (1) there are other base models that were implemented using the same GT data, (2) this design would result in minimal changes in the roles of the subteams/members, and (3) model ensembles can improve the overall predictions of the model generated warnings .
Rule learning: The first component is responsible for learning rules, i.e., APT rules with pfr function [5, 6, 7]. The inputs to the rule learner are: (1) the CPE-groups/actors that were identified from the D2web discussions and the time-point in which they are mentioned, and (2) the GT events with their types and time-points in which they were observed. The learner follows the technical approach discussed in Section IV to generate APT rules. The output of this component is a set of APT rules that are determined to be useful, i.e., exceeding some thresholds on probability of occurrence and some support count (frequency with which a rule is satisfied in the historical data). Such capability aids in producing accurate and timely warnings. Figure 4 shows the percentage increase in the likelihood of occurrence of attack events per days following D2web discussions for the rules that were identified as useful. We note that the reported records of Malicious Destination for Dexter only cover a time period that ends before the testing time period starts. Therefore, no rules were produced for Malicious Destination. Additionally, the system did not learn useful rules relating to Armstrong’s Malicious Destination events when is 1, 2, or 5.
Generating warnings: The second component is responsible for generating warnings using the APT rules. This component is executed daily by first acquiring all CVEs mentioned in the last 24 hours within the D2web streaming data. The CPE groups/nation-state actors for these mentioned CVEs are then obtained. Next, based on the APT-rules, the model tries to match the CPE/nation-state actor mappings to a particular rule. If a match exists, the model predicts if and when an attack exploiting the vulnerabilities will occur by generating warnings. The warning fields are populated using the information contained in the rule, such as the probability, event type, and target organization. Such details help in producing actionable warnings, i.e., warnings that provide metadata/details including the CVEs/tactics, industry, volume of discussions, etc. Section IV-C provides further details about the way warnings are generated from rules.
Model fusion: The final component deals with consolidation. It fuses warnings from various heterogeneous models (including DARKMENTION), populates any missing warning fields according to the program requirement, and generates the final version of each warning. This completed warning is submitted to the Security Operations Center. Each warning submitted is available to view and drill down into using a Web UI and Audit Trail analysis capability. This audit trail goes from the submitted warning all the way through model fusion, the individual models, and each individual model’s raw data used to generate a warning. In the case of DARKMENTION, this would include the D2web postings/items with the CVEs mentioned highlighted. This capability makes the warnings that DARKMENTION produces transparent warnings, i.e., allows analysts to easily trace the warnings back to the rules that were triggered, discussions that fired the rules, etc. Figure 5 shows two screenshots taken from the system.
Iv Annotated Probabilistic Temporal Logic (APT-logic)
We will define here the syntax and semantics of APT-logic programs (set of APT-logic rules) applied to our domain, which is built upon the work of Shakarian et al. in .
Herbrand base. We use to denote the Herbrand base (finite set of ground atoms) of a first order logical language . Then, we divide into two disjoint sets: and , so that . comprehends the atoms allowed only in the premise of APT rules, representing conditions or users’ actions performed on D2web websites, e.g., . On the other hand, comprehends the atoms allowed only in the conclusion of APT rules, representing actions or malicious activities reported by Armstrong or Dexter organizations in their own facilities, e.g., .
Formulas. Complex sentences (formulas) are constructed recursively from atoms, using parentheses and the logical connectives: ( negation, disjunction, conjunction). However, we note that all formulas in this paper are single atoms.
Time formulas. If is a formula, is a time point, then is a time formula which states that is true at time .
Probabilistic time formulas. If is a time formula and is a probability interval , then is a probabilistic time formula (ptf). Intuitively, says is true with a probability in , or using the complete notation, says is true at time with a probability in .
APT rules. Suppose condition and action are formulas, is a natural number, is a probability interval and is a frequency function symbol that we will define later. Then is an APT (Annotated Probabilistic Temporal) rule, which informally saying, computes the probability that is true in exactly time units after becomes true. For instance, the APT rule below informs that the probability of Armstrong company being attacked by a malicious-email, in exactly 3 time units after users mention “debian” on forums_1, is between 40% and 50%.
mention_on(set_forum_1,debian) attack(armstrong, malicious-email)[3,0.4,0.5]
World. In general, a world is any set of ground atoms that belong to . However, due to our constraint that separates atoms into and , not all possible worlds are allowed in our APT rules. Strictly, one atom belonging to and one atom belonging to must be present in an allowable world for our pfr rules.
Thread. A thread is a series of worlds that model the domain over time, where each world corresponds to a discrete time-point in . specifies that according to the thread , the world at time will be . Given a thread and a time formula , we say satisfies (denoted ) iff:
If for some ground time formula , then satisfies ;
If for some ground time formula , then does not satisfy ;
If for some ground time formulas and , then satisfies and satisfies ;
If for some ground time formulas and , then satisfies or satisfies ;
Frequency functions. A frequency function represents temporal relationships within a thread, checking how often a world satisfying formula is followed by a world satisfying formula . Formally, a frequency function belonging to maps quadruples of the form to [0,1] of real numbers. Among the possible ones proposed in , we investigate here alternative definitions for Point Frequency Function (pfr), which specifies how frequently the action follows the condition in “exactly” time points. To support ongoing security operations we need to relax the original assumption of a finite time horizon in [5, 6]. We introduce here a different but equivalent formulation for pfr that does not rely on a finite time horizon. To accomplish that, we first need to define how a ptf can be satisfied in our model. If we consider the ptf , and some , where is the set of all ptf’s satisfied by our thread , we say iff:
If for some ground , then ;
If for some ground formula , then ;
If for some ground formulas and , then and ;
If for some ground formulas and , then or ;
Satisfaction of APT rules and programs. We say satisfies an APT Rule (denoted ) iff:
Iv-C Rules and warnings
Probability intervals. We derive the probability intervals related to all pairs, where is the number of events that produced the probability .
APT programs. Our algorithms only adds to the logic programs the pfr
rules with lower bounds exceeding the prior probability of the rule’s head happening at any random time point.
Warnings generation. The problem is to identify whether a triggered rule should generate warnings, and the number of warnings to generate. When there is no triggered rules on a given day, no warnings are generated. When two rules are triggered on the same day, both predict the same attack type is going to happen on the same day (i.e., same ), and one predicts number of attacks while the other predicts number of attacks, then they will generate warnings if both are qualified to generate warnings. A pfr is qualified to generate number of warnings if (1) the rule is triggered, and (2) there is no other triggered rule on the same day and same rule head and as ’s but has higher point probability than ’s.
V Experimental Results
In this section, we provide evidence of the viability of our approach through a series of experiments. The warnings that are submitted by our system are evaluated by the Security Operations Centers (SOCs) on a monthly basis. However, we internally evaluated DARKMENTION since the external evaluations are aggregated for all models, and DARKMENTION was operationally deployed only after the time periods those reports cover.
V-a Experimental Settings
We perform evaluations on the warnings targeting Armstrong that were submitted during July, August, and September of 2017. The results are aggregated per months for the experiments on Armstrong data while aggregated on periods of 7 days for Dexter. The latter starts from July 1 to July 28, 2016. These time windows differ because the Armstrong dataset covers a longer period of time as compared to the period covered by Dexter, and there is no more GT data about Dexter that is going to be provided or evaluated by the program. The reported records of Malicious Destination for Dexter only cover a time period that ends before the testing time period starts, hence they are not evaluated.
V-B Evaluation Metrics
To evaluate the accuracy of our system, we use three metrics: recall, which corresponds to the fraction of GT events that have matching warnings from the total number of GT events; precision, which is the fraction of warnings that have matching GT events from the total number of the generated warnings; and F1, which is the harmonic mean of recall and precision.
Matching warnings and GT events. DARKMENTION predicts the exact number of attacks on each day. Therefore, the matching problem is to find whether a warning earns credit for predicting a GT attack event . If predicts an attack with different type than 's, or predicts an attack on a different day than the occurrence day of , then they do not match. Otherwise, they may or may not match based on whether or not or have already been paired up with another GT event or a warning, respectively.
To join together warnings with GT events in parings while ensuring that resulting pairings are mutually exclusive, we use the Hungarian assignment algorithm . Intuitively, the algorithm takes as input an matrix representation of ( lead-time). Lead-time is the time between warnings and GT events that are qualified to match. Then, the algorithm returns a solution that maximizes the total lead-time111111Lead-time is a metric set by the program to evaluate the performers, but we ignore it in this evaluation.. Here, is a set of pairs, each maps a warnings to a GT event such that the pairs are guaranteed to be mutually exclusive121212We add dummy rows (warnings) or columns (GT events) with lead-times of 1 to make the matrix square, then remove from the solution the pairs where the lead-time is 1.. We store in the database the pairs that are returned by the algorithm.
We found that our system outperforms a baseline system that randomly generates number of warnings on each day such that each value of has a chance proportional to its frequency of occurrence in the historical data. We repeat the baseline for 100 runs and take the average of each metric. In the real-time deployment of DARKMENTION, human experts can evaluate the warnings by leveraging the other capabilities of the system, i.e., transparency and actionable through a Web UI dashboard. However, in those experiments any triggered rule is counted, which is not necessarily important given other details. That said, our system scored significantly higher than the baseline system as shown in Table I.
|Dataset||type||Testing starts||#GT-events||DARKMENTION||Baseline* (average of 100 runs)||%increase in F1|
Vi Related Work.
To the best of our knowledge, this paper is the first to present a deployed system that applies causal reasoning to predict enterprise-related external cyber threats. However, there exists a large body of related research with studies about malicious hacking community in D2web, and studies related to rule-learning methods for security applications.
Hacking community on D2web.. While the hacking community in D2web sites has been widely investigated in the existing literature for applications such as analyzing the economics of D2web forums/markets [13, 14] and identifying future cyber-threats to mitigate risks [2, 15], none of these studies identify threats related to specific corporations or identify when in the future the predicted events may occur. DARKMENTION specifically predicts enterprise-targeted attacks and the periods in which those threats are predicted.
Rule-learning methods for security applications. A large body of work on understanding and modeling the behavior of threat actors and reasoning future risk levels to assist in the implementation of strategic defense has been proposed. While a growing number of studies along these lines have focused on applications related to understanding the behavior of terrorist and insurgent groups [16, 7], the cyber-security applications have not received much attention, except in the line of graph-based and attacker/defender game theoretical modeling [17, 18]. Our work differs from such theoretical approaches since we focus on practical details related to deployment, and evaluate our system with real-world data.
We present DARKMENTION, a deployed system that produces warnings about cyber incidents that will likely occur in the future. Although the problem is difficult, our system proves to be useful as a tool that helps SOC teams to identify risks, potential sources of risk (vulnerabilities or threat actors) and context on which it builds its reasoning in a timely, actionable, accurate, and transparent manner. Our team was selected to progress to the second phase of the CAUSE program.
Some of the authors were supported by the Office of Naval Research (ONR) Neptune program, the ASU Global Security Initiative (GSI), and the National Council for Scientific and Technological Development (CNPq-Brazil). Paulo Shakarian, Dipsy Kapoor, and Timothy Siedlecki are supported by the Office of the Director of National Intelligence (ODNI) and the Intelligence Advanced Research Projects Activity (IARPA) via the Air Force Research Laboratory (AFRL) contract number FA8750-16-C-0112. Gerardo Simari is also partially supported by Universidad Nacional del Sur (UNS) and CONICET, Argentina. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, AFRL, or the U.S. Government.
-  Verizon, “2017 data breach investigations report,” 2017. [Online]. Available: https://www.ictsecuritymagazine.com/wp-content/uploads/2017-Data-Breach-Investigations-Report.pdf
-  M. Almukaynizi, E. Nunes, K. Dharaiya, M. Senguttuvan, J. Shakarian, and P. Shakarian, “Proactive identification of exploits in the wild through vulnerability mentions online,” in 2017 International Conference on Cyber Conflict (CyCon U.S.), Nov 2017, pp. 82–88.
-  P. Suppes, A probabilistic theory of causality. North-Holland Publishing Company Amsterdam, 1970.
S. Kleinberg and B. Mishra, “The temporal logic of causal structures,” in
Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2009, pp. 303–312.
-  P. Shakarian, A. Parker, G. I. Simari, and V. S. Subrahmanian, “Annotated probabilistic temporal logic,” ACM Trans. Comput. Logic, vol. 12, no. 2, pp. 14:1–14:44, Jan. 2011.
-  P. Shakarian, G. I. Simari, and V. S. Subrahmanian, “Annotated probabilistic temporal logic: Approximate fixpoint implementation,” ACM Trans. Comput. Logic, vol. 13, no. 2, pp. 13:1–13:33, Apr. 2012.
-  A. Stanton, A. Thart, A. Jain, P. Vyas, A. Chatterjee, and P. Shakarian, “Mining for causal relationships: A data-driven study of the islamic state,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015, pp. 2137–2146.
-  E. Nunes, A. Diab, A. Gunn, E. Marin, V. Mishra, V. Paliath, J. Robertson, J. Shakarian, A. Thart, and P. Shakarian, “Darknet and deepnet mining for proactive cybersecurity threat intelligence,” in Proceeding of ISI 2016. IEEE, 2016, pp. 7–12.
-  NIST, “National vulnerability database,” https://nvd.nist.gov/, Last Accessed: Jan 2018.
-  J. M. Montgomery, F. M. Hollenbach, and M. D. Ward, “Improving predictions using ensemble bayesian model averaging,” Political Analysis, vol. 20, no. 3, pp. 271–291, 2012.
G. Wadsworth and J. Bryan,
Introduction to Probability and Random Variables, ser. A Wiley publication in mathematical statistics. McGraw-Hill, 1960. [Online]. Available: https://books.google.com/books?id=NNtQAAAAMAAJ
-  J. Munkres, “Algorithms for the assignment and transportation problems,” Journal of the society for industrial and applied mathematics, vol. 5, no. 1, pp. 32–38, 1957.
-  M. Motoyama, D. McCoy, K. Levchenko, S. Savage, and G. M. Voelker, “An analysis of underground forums,” in Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM, 2011, pp. 71–80.
-  L. Allodi, “Economic factors of vulnerability trade and exploitation,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2017, pp. 1483–1499.
-  N. Tavabi, P. Goyal, M. Almukaynizi, P. Shakarian, and K. Lerman, “Darkembed: Exploit prediction with neural language models,” in Proceedings of AAAI Conference on Innovative Applications of AI (IAAI2018), 2018.
-  V. S. Subrahmanian, A. Mannes, A. Roul, and R. Raghavan, Indian Mujahideen: Computational Analysis and Public Policy. Springer Science & Business Media, 2013.
-  J. Robertson, V. Paliath, J. Shakarian, A. Thart, and P. Shakarian, “Data driven game theoretic cyber threat mitigation.” 2016.
M. Brown, W. B. Haskell, and M. Tambe, “Addressing scalability and robustness
in security games with multiple boundedly rational adversaries,” in
International Conference on Decision and Game Theory for Security. Springer, 2014, pp. 23–42.