The escalation of cyber-attacks we are currently facing  is expected to continue, due also to the foreseen exponential increment of IoT and smart devices usage. The increase of the dependencies the users have on these connected devices raises the users exposure to cyber-attacks. The growth of frequency and severity of cyber-attacks comes along with an increment of the economical costs associated to the cyber-attacks’ damages . The existing protective and mitigative measures are not sufficient due to the sophistication of current attacks. This brings the need of enforcing efficient preventive and mitigative measures that are attacker-oriented, in particular, countermeasures that are specific to the attacker or group of attackers that are performing the attack. Furthermore, discovering who performed the attack and bringing this entities to justice, can be a deterrent for future cyber-attacks.
Attacker-oriented countermeasures require to discover the perpetrator of the attack, or the entity related to it. Attribution is the process of assigning an action of a cyber-attack to a particular entity/attacker/group of attackers. Currently, the attribution of cyber-attacks is mainly a manual process, performed by the forensic analyst, and is strictly related to the knowledge of the analyst, thus, is easily human biased and error prone. Performing the attribution is not trivial, as attackers often use deceptive and anti-forensics techniques , and the analysts need to analyze an enormous amount of data, filter 
and classify them, and arrive at a conclusion as swiftly as possible. The increasing use of IoT devices aggravates the work of the analysts and makes the attribution process more expensive, as the analysts might need to physically access the devices in order to retrieve their data.
Digital forensics helps during the attribution process, as it collects and analyzes the evidence left by the attack, but it is not able to deal with conflicting or incomplete information. It only works with technical evidence, and fails to consider other types of evidence such as geopolitical situations and social-cultural intelligence that provide useful leads during the investigation. Digital forensics tools mainly focus on collecting the pieces of evidence, which are given to the analyst that needs to analyze them, making the process often extremely human-intensive, requiring many skilled analysts to work for weeks or even months [8, 37]. The problem is aggravated by the large proportion of unstructured text data, which makes the automatic analysis challenging.
In this work, we propose an argumentation-based reasoner (ABR) that helps the forensic analyst during the evidence analysis and the attribution process. The proposed reasoner, given the pieces of cyber forensic and social evidence of the cyber-attack, automatically analyzes them and derives new information that is provided to the analyst. In particular, ABR can answer to queries, such as, who is the perpetrator of the attack, who has the motivations to perform it, which are the needed capabilities to perform the attack, or similarities with past attacks. Furthermore, ABR suggests to the analyst other paths of investigations, by giving hints of what other pieces of evidence can be collected to arrive at a conclusion and enabling a prioritised evidence collection. Our reasoner is based on our previous work , where we present briefly the main intuition used for constructing ABR. In this paper, we present a complete and fully automatic ABR, together with its reasoning rules, populated background knowledge and the conducted evaluation. To the best of our knowledge, this is the first automatic reasoner that analyzes and attributes cyber-attacks that uses technical and social evidence, and that is able to work with conflicting and incomplete knowledge.
This reasoner consists of a set reasoning rules, preferences between them, and background knowledge. The rules of the reasoner are constructed by analyzing the attribution process of well-known cases of cyber-attacks (e.g., APT1 , Wannacry ) and extracting the reasonings used by forensic analysts. The reasoning rules are constructed to be generic and to apply in different attack scenarios. The background knowledge incorporates the common knowledge that analysts use during the analysis process. Our reasoner performs the attribution of an attack using both technical and social pieces of evidence that are represented thanks to the use of a social model .
ABR is able to work with incomplete and conflicting evidence. We decided to base ABR on an argumentation framework, in particular a preference-based argumentation framework , which permits to deal with conflicting pieces of information by introducing preferences between them. We use preference-based argumentation as it similar to the decision making process followed by the digital forensics investigator. ABR is constructed using the Gorgias  tool, which uses abductive reasoning  combined with preference-based argumentation. The use of abduction permits us to arrive at conclusions even with incomplete information, where the missing information is abduced and then suggested to the analyst as hints of possible evidence to be collected.
ABR is a tool constructed to assist the analyst during the analysis process by giving a full picture of the analyzed evidence. Therefore, it provides together with the answer of the query also its explanation and the used evidence to arrive at that conclusion. Furthermore, ABR gives hints to the analyst of missing evidence, that if provided permits to pursue other investigation paths. We constructed ABR to be flexible and adaptable to user requests and changes. The use of ABR helps in promoting best practices while dealing with cyber-attacks, and in sharing lessons learnt from past experience.
In Sec. 2 we present relevant related work. We introduce our reasoner in Sec. 3. In Sec. 4 and 5 we present ABR’s main components, correspondingly ABR’s reasoning rules and its background knowledge. We give an overall evaluation and discussion in Sec. 6. In Sec. 7 we conclude and present future works.
2 Related Work
Attribution of a cyber-attack is the process of “determining the identity or location of an attacker or an attackers intermediary” . Tracing the origin of a cyber-attack is difficult since attackers can easily forge or obscure source information, and use anti-forensics tools, as usually they want to avoid being detected and identified . Digital forensics plays an enormous role in attribution by collecting, examining, analyzing and reporting the evidence . Also other techniques created for protecting the systems are used to collect forensic data, e.g., traceback techniques , honeypots , other deception techniques [1, 2].
Digital forensics comes with its own challenges , that can mainly be categorised into: complexity problems as the collected data are at the lowest raw format and require high resources to analyze them; quantity problems as the enormous amount of collected data are too large to be analyzed manually . Forensics techniques identify and collect the evidence that is later managed and analyzed by the forensic analyst. Since often the data are collected by different sources and the attacker plants false evidence to lead investigators off their trail, the latter are likely to be in a situation with multiple pieces of conflicting evidence. Digital forensics techniques can deal with conflicting information during the evidence collection phase [5, 13, 16], but lack of the ability to work with conflicting pieces of evidence during the analysis and attribution process. Another problem of digital forensics techniques is that they cannot reason with incomplete information, and arrive at a conclusion only when they have all the needed evidence. Digital forensics only uses technical evidence [35, 37] and fails to consider other kinds of evidence such as geopolitical situations and social-cultural intelligence, which could provide useful leads during the investigations.
A theoretical social science model is proposed in , called the Q-Model that describes how the analysts put together technical and social evidence during the attribution process. In this model attribution is described as an incremental process passing from one level of attribution to the other. The Q-Model represents how the forensic investigators perform the attribution process and a particular attention is put on the social evidence, where contextual knowledge such as ongoing conflicts between countries or rivalry between corporations are very useful in detecting motives of potential culprits.
We decided to use argumentation for our reasoner, as argumentation helps during the analysis and attribution process, because it is transparent and encourages the evaluation of the argument, by assessing relative importance of various factors when making decisions . Argumentation captures the fact that the final decision might change if more information is available (i.e., non-monotonic reasoning ), where more information might reveal new arguments that are in conflict with the original winning ones and are stronger than them. Argumentation has been used to tackle the attribution problem, where the authors of [30, 36]
propose the DeLP3E framework to attribute operations of cyber-attack. This theoretical framework is based on the extension of Defeasible Logic Programming with probabilistic uncertainty. The DeLP3E framework does not deal with incomplete evidence, thus, it cannot make assumptions in order to arrive to a conclusion and it cannot suggest the user new paths of investigations or new evidence to be collected. It lacks of general technical and social common knowledge, e.g., ongoing conflicts/rivalries between countries/corporations, information about past attacks, cyber security capabilities of entities, which can be very useful in detecting motives, capabilities and potential culprits. DeLP3E uses as a measure for its conclusion the probabilities of an event of being true, which requires for the user to provide the probability of being true for each of the given pieces of evidence, and it does not distinguish the different levels of reasonings that can be applied to arrive at certain conclusions.
Despite the advances in using digital forensics or argumentation in attribution, there are still some shortcomings yet to be resolved. The most important one, is that none of the current works target the social aspect of attribution. The current state of the art does not deal with incomplete evidence, which is an important aspect of forensics investigation, as usually not all evidence can be collected due to time/resource constraints, as well as the anti-forensics tools used by the attacker. We believe, our reasoner is the first attempt to use a social model to categorise evidence and rules in an argumentation-based framework, which leads to a more accurate and explainable attribution that helps the analyst during the analysis process also in case of conflicting and incomplete evidence.
3 Argumentation-Based Reasoner for Attribution
In this work, we introduce an argumentation-based reasoner (ABR) that is based on a preference-based argumentation framework. ABR is composed of two main components, the reasoning rules and the background knowledge, see Figure 1. Given as input the pieces of evidence, ABR analyzes them, and gives as output answers to the queries, the possible perpetrator of the attack, or suggestions of further pieces of evidence that can be provided to the reasoner in order to perform a more precise or a different attribution. The reasoning rules are extracted from past cyber-attacks and are constructed using the argumentation framework. They are divided in three layers: technical, operational and strategic layer, following the used social model structure . The background knowledge is information extracted from past cases and pertinent facts. ABR gets as input from the user the pieces of evidence (technical and social evidence), analyzes them by using the reasoning rules and the background knowledge, and gives as result to the analyst answers to his/her queries, e.g., if a given entity performed or not the attack, together with an explanation on how the conclusion was reached, and hints about what other pieces of evidence s/he can collect for performing a more precise or a new attribution.
3.1 Argumentation Framework for Attribution
We base our reasoner for the attribution process on a preference-based argumentation framework [20, 18], as it permits the user to take decisions while working with conflicting evidence, and it naturally encodes the different layers of reasonings using its preference relations. The used framework best simulates the analysis and attribution process made by the analyst, who needs to use different reasoning rules that work with technical and social aspects of the attack, have exceptions between them, and can derive conflicting conclusions. Our framework permits the analyst to work with conflicting evidence and reasoning rules that derive conflicting conclusions, by introducing preferences between them that are exceptions or context dependent. The use of argumentation permits to provide an explanation of the given results. Let us briefly introduce the used framework.
An argumentation theory is a pair of argument rules and preference rules . The argument rules are a set of labelled formulas of the form:
where are positive or negative ground literals, and is the label denoting the rule name. In the above argument rule, denotes the conclusion of the argument rule and denote its premises. The premise of an argument rule is the set of conditions required for the conclusion to be true. In our framework, the argument rules are the reasoning rules used by ABR. Let us show below a reasoning rule that is part of ABR:
where the rule name is the label of the rule, in this case is ; the head is the second argument and represents the conclusion of the rule, in this case is ; the body predicates are the literals following the head, and represent the premises of the rule, in this case is .
The preference rules are a set of labelled formulas of the form:
where is the label denoting the rule name, the head of the rule is , and , are labels of rules defined in , and refers to an irreflexive, transitive and antisymmetric higher priority relation between rules. The above rule means that has higher priority over , or better is preferred over . The preference rules, also called priority rules are true always or in certain conditions or contexts. We show below a priority rule, , that is part of ABR, denoting that rule is preferred over rule .
We have priority rules between rules that are in conflict between each other or better that derives conflicting conclusions. Preference-based argumentation allows the analyst to handle non-monotonic reasoning  in attribution, where the introduction of new evidence might change the result of the attribution (due to conflicting arguments) and the analyst’s confidence on the results. Argumentation is particularly useful as it permits to represent the reasoning rules in an intuitive and simple way.
Let us introduce the following rule that is part of ABR:
Rule describes that entity is not the possible culprit for the attack , because it does not have the capabilities for performing it. Rule and are in conflict between each other because in case both preconditions are provided, then they derive two conflicting conclusions. Given the above preference rule , rule is preferred over rule . Thus, in case both preconditions for and are given, we take into consideration only conclusion, .
The inputs of ABR are pieces of evidence, called base evidence. The reasoning rules derive new evidence, called derived evidence that is derived using the given base evidence together with the background knowledge. A derived evidence is a predicate that is proved to be true when its premises are true. The reasoning rules and their preferences are extracted from real cyber-attacks attribution.
ABR is the first tool that is able to work with incomplete evidence. It provides hints of missing evidence or new investigation paths to the user, thanks to the use of abductive reasoning . The use of abducible predicates permits to fill the knowledge gaps of the reasoning, by allowing ABR to perform the analysis and arrive at a conclusion even when there is insufficient evidence. This feature is extremely important to the analyst who is provided with new possible conclusions and new evidence to be collected. To construct ABR we use the Gorgias  tool, which is a preference-based argumentation reasoning tool that uses abduction.
Let us introduce the following rules from ABR:
which states that has the motivates for performing attack , in case it has economical motives against the target of , where is an industrial company, the context of was economical, and had a specific target. ABR treats as an abducible predicate. For every abducible predicate we have the rules that derive the predicate or its negation. For the abducible we can prove that it is not true by using the following rule.
In case we are not able to derive , then we can abduce that is true, and we can use it to derive in case we have the rest of the preconditions.
3.2 Technical and Social Attribution
The main goal of the constructed argumentation-based reasoner is to assist the forensic analyst during the evidence analysis. This reasoner, given the pieces of evidence of a past/ongoing attack, attributes this attack to one or different entities, or provides suggestions to the user in regards to other pieces of evidence that the user can provide in order to attribute the attack. To perform the attribution process, ABR needs to work also with non-technical pieces of evidence, usually called social evidence. In order to deal with technical and social evidence, we decided to base our reasoner on a social model for attribution, called Q-Model . This model represents how the analysts perform the attribution process of cyber-attacks. By following the Q-Model, we categorize the evidence and the reasoning rules into three layers: technical, operational and strategic. We show some of ABR’s predicates divided in the three layers in Figure 2. The combination of information in these layers permits the attribution of a cyber-attack, as it naturally emulates the analyst attribution process. Depending on the layer a rule/evidence is part of, we call it a technical, operational, or strategic rule/evidence and denote its name starting correspondingly with , , or .
The technical layer is composed of rules that deal with pieces of evidence obtained from digital forensics processes, relating to technical evidence of the attack, and how it was carried out, e.g., the IP address from which the attack was originated, time of attack, logs, type of attack, used code. Let us give below a technical layer reasoning rule that is part of ABR:
Rule denotes that if the attack uses zero-day vulnerabilities, , then this attack requires a lot of resources, .
The operational layer is composed of rules that deal with non-technical pieces of evidence that relate to the social aspects where the attack took place, e.g., the motives of the attack, the needed capabilities to perform it, the political or economical context where it took place. Let us give below an operational layer reasoning rule, part of ABR:
Rule denotes that if requires a large amount of resources, and an entity has (large amounts of) resources, , then has the capability to carry out the attack, .
The strategic layer is composed of rules that deal with who performed the attack, or who is taking advantage of it. Let us give below a strategic layer reasoning rule that is part of ABR:
Rule denotes that if has both the capability, , and the motive, , to carry out the attack , then is a possible culprit of the attack, .
As shown in Figure 1, the operational rules use information derived from the technical layer, and the strategic rules use information derived from the technical and operational layer. All the three layers use the evidence given by the user and the background knowledge. This categorization of the evidence and rules in three layers, following from the Q-Model, purposely emulates the forensic investigator reasoning during the attribution process, where s/he moves from the technical layer, to the operational, and finally to the strategic one, by using the previous layers’ conclusions. Furthermore, this categorization improves ABR usability, given the forensic investigator (i.e., ABR user) familiarity with these three layers.
4 Abr Reasoning Rules
The main components of ABR are its reasoning rules that permit to perform the reasoning behind the attribution. We used the analysis process of forensic investigators performed on different cyber-attacks (e.g., APT1 , Wannacry ) to extract the reasoning rules that compose ABR. These reasonings are then translated into generic argumentation rules that form the reasoning rules. While ABR features around 200 rules, for conciseness, we discuss only a few of them in this paper.
As described in the previous section, the reasoning rules, also called simply rules, are divided in three layers: technical, operational and strategic. Let us give an overview of some of the strategic rules of the reasoner, taken from the set of all ABR rules, and show how the rules of the different layers are related to each other. The following rules describe some of the circumstances where we can prove that an entity is a possible culprit () or not () of an attack .
Let us use the strategic rule , to show the relations of the reasoning rules between the different layers.
Rule uses the predicates and ; where the first is a derived predicate of the operational layer, indicating that has motives to perform the attack and is a derived predicate of the technical and operational layer, indicating that entity has the capabilities to perform .
The predicate can be derived using the rule introduced in Section 3.1, represented as below:
The above rule says that an entity/country/group has the motives to perform , in case has economical motives to attack a particular entity , which is an industry, and the attack was designed to target entity , and the context of was economical. The predicates used in are: is an evidence, stating that is the target of ; is a background fact, stating that is an industry; is an evidence, stating that country benefits economically from attacking industry , e.g., if has identified as a strategic emerging industry, we say that is true; is an evidence that is true when was constructed to attack a particular target; is an evidence, which if the target of an attack was a “normal”111“Normal” industries are industries that are not closely related to a country’s national interests. A “political” industry is an industry that is closely related to a country’s national interests, e.g., the military or energy sector. industry, , then the context was economical, , denoted by , if the target was a “political” industry, , then the context was political, , denoted by .
We introduce below a rule that derives the predicate222For the sake of space, we introduce only one of the different rules that can derive the predicate..
Rule states that has the capability to perform , when requires high resources to be accomplished and has the needed resources. Predicate can be derived from following technical rules:
where means that has high-security; means that has a high volume; means that was performed over a long duration (few months or even years), and means that is a complex attack and requires high level skills to be performed. Rule states that requires high resources if its target has put in place high security measures, rule states that requires high resources if the attack has a high volume and a long duration, and rule states that requires high resources if it is a high level skilled attack.
ABR’s rules are used to analyze the existing evidence and to derive new one, in order to offer new insights to the analyst, as shown in the below example.
Let us introduce a realistic cyber-attack example, taken from the US bank hack , that occurred in 2012, where US banks faced denial of service (DoS) attacks, causing websites of many banks to suffer slowdowns and even be unreachable for many customers. The banks’ web hosting services were infected by a high level skill malware called Itsoknoproblembro. Earlier that year, US placed economic sanctions against Iran. Some of the pieces of evidence of this attack () are as below:
By using rule and the evidence that this attack is a high level skill one, ABR derives that this attack requires high resources, . Another ABR rule that can be applied in this example is the following one:
Rule derives that Iran has political motives against US because of the sanctions imposed by US against Iran, .
5 Abr Background Knowledge
Another important component of our reasoner is the background knowledge, that is composed of non-case-specific information divided into general knowledge and domain-specific knowledge. We give a list of some of the background knowledge predicates in Table 1. To the best of our knowledge, ABR is the first tool that incorporates and uses this type of knowledge for the cyber-attack analysis and attribution. The use of the background knowledge alleviates the analysts’ work and avoids human errors and bias as the information is extracted from impartial sources. The background knowledge is composed of pieces of information that are used as preconditions by the reasoning rules to answer the users’ queries. ABR’s background knowledge can be updated and enriched by the user.
|industry(infocomm)||Type predicate for industries|
|norIndustry(infocomm)||Non political industries|
|country(united_states)||Type predicate for countries|
|cybersuperpower(united_states)||List of cyber superpowers|
|gci_tier(afghanistan,initiating) gci_tier(poland, maturing) gci_tier(russian_federation,leading)||Global Cybersecurity Index (GCI)|
|firstLanguage(english, united_states)||First language used in the country|
|goodRelation(united_states, australia)||Good relations between countries|
|poorRelation(united_states, north_korea)||Poor relations between countries|
|prominentGroup(fancyBear)||Prominent hacker groups|
|groupOrigin(fancyBear, russian_federation)||Country of origin of a group|
|pastTargets(fancyBear, [france,…,poland])||Past targets of a hacker group|
|malwareLinked(trojanMiniduke,cozyBear)||Past attribution of malware|
|malwareUsedInAttack(flame, flameattack) ccServer(gowin7, flame)||
|domainRegisteredDetails(gowin7, adolph_dybevek, prinsen_gate_6)||Domain registration details of C&C server|
5.1 General Knowledge
The general knowledge consists of information about countries characteristics, international relations between nations, and classification of the types of industry. This information is used together with the given pieces of evidence, to perform the analysis. Below we illustrate how these predicates are used by ABR’s rules.
Language indicators in malware can provide useful clues regarding the possible origin of attacks. We use two language artefacts: default system language settings, , and language used in code, . We present below two of ABR’s rules, and , that use the language evidence to derive the attack possible origin , when the country first language matches the one found in the system/code.
The cyber capability of a nation
is another interesting information as it limits the level of attacks an entity can possibly sponsor or carry out. To estimate the countries’ cyber capabilities, we use the Global Cybersecurity Index (GCI) Group. There are three GCI groups: leading, maturing and initiating, from where we classify the countries by their amount of resources. We show below two of ABR’s rules that use the countries cyber capability.
A country if it is in the ‘leading’ GCI group. Countries in the ‘initiating’ GCI group are considered as .
Let us continue with the introduced in Example 1. In the background knowledge we have that Iran is part of the leading GCI group. Thus, from rule we can derive that Iran has the resource for performing high level skills attacks, . The operational rule derives that Iran has the capabilities to perform the US bank hack, as shown below.
Good international relations between two countries can indicate that a state-sponsored attack is unlikely to happen between these two countries. We encoded this information in ABR by creating a list of countries that have good relations between each other () and a list of countries that might have bad relations between each other ()[39, 7]. This information is used to narrow down the countries that might have or not a motive to carry out an attack, as shown by the following rule.
In the above rule, we derive that country does not have any motive to perform , as it has good relations with that was the target of , .
5.2 Domain-Specific Knowledge
Domain-specific knowledge consists of information about prominent groups of attackers and past attacks. These facts are primarily used in the strategic and technical layer. We encoded information on prominent APT groups taken from [12, 26], where for each group we have their: name or ID; country of origin; countries/organisations targeted by the group in the past; malware or pieces of malicious software (suspected or confirmed) linked to the group. We assume these groups have the capabilities of conducting long and large attacks, as shown by the below rule, which describes that we can derive that an entity has the capabilities to perform an attack, if is a prominent group of attackers.
Another important part of the domain-specific knowledge is the similarity between past attacks, e.g., similarity to an APT-linked malware indicates that the culprit might be the same APT group, as shown previously in rule .
In rule , we derive that the attacker of is most likely entity , because the used malware is similar to another malware linked to , and both malwares were not found in the black market . We use the predicate to denote that two malwares are similar to each other, e.g., in the below rules we define that and are similar if they use a similar code obfuscation () mechanism, or they share code between each other, or is derived by modifying .
The similarity can also stem from both malwares having similar command and communication (C&C) server333For the sake of simplicity, in this paper we introduce just a part of the rules used by ABR to identify similarities between malwares or malicious pieces of software., as shown below.
The similarity of C&C server of two different malwares can be derived by using other ABR’s technical rules, as shown below:
where derives that the C&C server are similar when and use the same server ; derives the similarity when are used two different servers registered under the same address ; while rule derives the similarity when are used two different servers registered under the same name .
6 Evaluation and Discussions
ABR is a flexible tool designed to be part of an iterative process, where the user can add other pieces of evidence, rules or preferences after evaluating the results produced by ABR. ABR’s input is given manually by the analyst, or it is collected automatically by ABR through an automatic extraction process by using existing digital forensics tools.
We evaluated ABR performance and usability using various hybrid and realistic cyber-attacks examples. During the evaluation, ABR used correctly the reasoning rules and identified possible culprits of the attacks. The provided explanations, in text and graphical representation, helped to improve the usage of ABR as they provided information that was used by the user for the next iterations. ABR answered correctly to the requested queries (e.g., if a country had the motives or capabilities to perform the attack, or if a particular group of attackers could be related to the attack following the technical evidence of the used malware) for the provided pieces of evidence.
For every tested scenario, we run ABR by using a subset of the known evidence. Depending on the use-case and the provided evidence, ABR was able to arrive to some conclusions by abducing some of the missing predicates. ABR provided interesting results when asked to provide suggestions of missing evidence, as it proposed useful missing evidence and also new (not predicated) investigation paths. When a good part of the known evidence of an attack were provided to ABR, then its results coincides with the entity to whom the attack was attributed in the real world (or the hybrid example) or this entity was contained in ABR’s list of possible culprits that contained also entities that were not considered before as possible culprits.
Let us now briefly introduce some of the cyber-attacks used to evaluate ABR and its conclusions. For the sake of space, we decided to show some well-known attacks where ABR was tested, as they do not need a detailed introduction.
ABR analyzed evidence of the gaussattack  and attributed it to the equationGroup. The represents the attack discovered in 2011 that used a virus called, “Gauss”, which was mainly targeting the middle east region, mostly concentrated on attacking Lebanese banks, stealing data and spying on bank transactions. ABR’s attribution was made given the similarities of this attack to Flame, Duqu, and Stuxnet, and that Flame was previously attributed to the equationGroup. This attribution coincides with the entity to whom the gaussattack was publicly attributed.
ABR was given evidence from the apt1 attack that represents the data breach occurred in 2004 . ABR attributes this attack to China, based on the evidence that the majority of IP addresses from where the attack originated were located in China, the attacker’s system default language configuration detected from the used malware was Chinese, the main group of victims were from the infocom industry, and China had economic motives to attack this industry. Given the high volume of , and its long duration, ABR after analyzing all the provided evidence, attributed the attack to a state attack, in specific to China, and not to a group of hackers, e.g., not to the Advance Persistent Threat (APT) group to whom the attack was publicly attributed.
ABR analyzed evidence of the Stuxnet attack  and attributed it to two different possible actors, US and Israel. The Stuxnet attack was first discovered in 2010 at the uranium enrichment plant in Iran. The used code was complex, mainly targeting Iran, and was using four zero-day vulnerabilities. ABR attributed the attack to US and Israel, given the high resources required to perform such a sophisticated attack, and the political conflicts that existed in that period between Iran and US, and between Iran and Israel.
ABR analyzed evidence of the Sony attack . The Sony attack represents the 2014 attack when hackers infiltrated the Sony’s computers and stole data from Sony servers. A group called “Guardians of Peace” claimed credit for the attack, but several government agencies claimed that the attack is state-sponsored by North Korea. ABR attributed this attack to three possible culprits: the attackers group called “Guardians of Peace” and to two countries, Iran and North Korea. The attribution to Iran came as a consequence of low diplomatic relations between US and Iran. ABR’s results were unexpected, with respect to the public attribution, as it provides to the analyst new possible attackers, by enriching the possible paths of investigation and avoiding the human bias.
ABR analyzed evidence from the Conficker  attack and it was not able to attribute the attack, but it suggested the analyst to provide evidence of hackers groups located or with interest in Ukraine, as the first version of the attack was constructed to avoid machines with Ukrainian keyboards, or evidence that provide motivations for a nation state attack or an economical one, as the attack was a sophisticated one and it could either be a nation state attack or performed by a cyber criminal organization.
Let us now show ABR’s final steps of the analysis and attribution of the . Following from Example 1 and 2, ABR derived the following predicates: and . ABR can now apply the following operational rule:
Rule permits ABR to derive that Iran has motives to perform the attack, as it has political motives and this motives could be applied for the period of the attack. Thus, by applying the strategic rule , ABR derives that Iran is a possible culprit for this attack (), which is in line with the public attribution. ABR provides this answer to the analyst together with its derivation tree with all the used rules and evidence.
ABR together with the answers of the queries provides also the different ways the result was derived, together with an explanation composed of the used rules and pieces of evidence. The explanations are in the form of text and graphical representation. The given explanation makes ABR decision and analysis process transparent to the user, and provides her/him further information that can be used for the cyber-attack analysis. As the main goal of ABR is to help the analyst during the analysis process and provide her/him with useful information, when a query is called, ABR provides all the possible answers together with an associated score that represents the level of confidence ABR has on the provided result.
’s results include the missing evidence and suggestions about other appropriate investigations paths to be followed by the analyst. The suggested evidence can be collected by the analyst in a second moment and given toABR as part of an iterative process. We decided to provide only the first list of results of the missing pieces of evidence, which are connected to the given evidence, together with the conclusions that could be derived by them, to keep ABR’s running time and complexity polynomial. Hence, ABR does not provide an exhaustive list of all the possible missing evidence. On the other hand, limiting the suggested pieces of evidence to be collected can be beneficial for the analyst, who can focus his/her attention on particular evidence, instead of spending time and resources on checking a full list of missing evidence.
ABR’s reasoning rules promote best practice and help sharing lesson learnt between analyst, as its reasoning rules are constructed using the analysts reasoning process during cyber-attacks attribution. As the attribution process is mainly human based, it makes the process easily human biased. In particular, the analyst as a human, usually takes decisions based on past experiences and on the resources invested so far . In some cases, it is difficult for the analyst to follow a new path of investigation because of the high resources invested on the old ones. ABR permits to reduce the human bias in the attribution process by suggesting new paths of investigation to the user, where some of these paths were not considered before.
ABR’s rules are extracted from existing attacks, thus, it can fail to deal with new evidence not encountered before, which is not included in the reasoning rules. As ABR uses in its reasoning information of past attacks and past attribution, in case this information is false or a wrong past attribution was made, the error can be propagated to ABR’s new results. For example, the Sony attack attribution was built on the (alleged) claim that North Korea was responsible for the assault on South Korean banks in 2013 . We can avoid this problem, by not using past attacks and attribution information, but would make the attribution more difficult, or would require a larger amount of pieces of evidence. Furthermore, using results of past attribution is a common practice adopted by forensic analyst during their analysis and attribution process, as it permits to identify existing groups of attackers and to use their modus operandi as an important factor for the attribution.
7 Conclusion and Future Work
In this work, we proposed an argumentation-based reasoner (ABR) that helps the forensic analysts during the analysis and attribution process of cyber-attacks. ABR is a novel tool constructed to automatically attribute cyber-attacks by leveraging both social and technical evidence, thus assisting the forensic analyst during his/her investigations. ABR provides explanations of the given results and hints of new investigation paths. To the best of our knowledge, this is the first automatic reasoner that attributes cyber-attacks by considering technical and social evidence that is also able to work with incomplete and conflicting evidence. The use of preference-based argumentation and abductive reasoning, permits ABR to work with conflicting pieces of evidence and to fill the knowledge gaps that derive from incomplete ones. We introduced ABR’s main components that are its reasoning rules (extracted from past cyber-attacks attribution), and its background knowledge (general social and technical knowledge). Once the pieces of evidence are given to ABR, it provides the answers of the queries, without the need of any manual tweaking. We improve ABR usability by categorizing the evidence and rules in three layers (by applying the Q-Model), as it follows a model that the forensic analyst is familiar with. Our reasoner emphasis the incremental and iterative nature of attribution, by making the derivations of the solutions fully transparent to the forensic analyst.
As future work, we plan to increase ABR reasoning capabilities by adding new reasoning rules, and background knowledge. In this work, we mainly focused on constructing the ABR reasoner and its usability, and left the automatic population of the reasoning rules and background knowledge for a future work. In particular, we plan to use NLP techniques to automatically extract the reasoning rules and social evidence used by the forensic analyst. In order to fully automate the attribution and analysis process, we plan to enrich ABR with an automatic forensic evidence extraction/collection and integrate it with forensics tools and data mining techniques. We plan to test ABR with other real cyber-attacks, especially non nation-state attacks and to improve its usability using the forensic analysts feedbacks. Another interesting future work is to include probabilities for our pieces of evidence and reasoning rules, in order to provide probabilistic measures for the given attributions.
Erisa Karafili was supported by the European Union’s H2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 746667.
-  Almeshekah, M.H., Spafford, E.H.: Planning and integrating deception into computer security defenses. In: NSPW. pp. 127–138. ACM (2014)
-  Almeshekah, M.H., Spafford, E.H.: Cyber security deception. In: Jajodia, S., Subrahmanian, V., Swarup, V., Wang, C. (eds.) Cyber Deception: Building the Scientific Foundation. pp. 23–50. Springer (2016)
-  Altman, A., Miller, Z.J.: Sony Hack: FBI Accuses North Korea in Attack That Nixed The Interview. http://time.com/3642161/sony-hack-north-korea-the-interview-fbi/ (2014)
-  Anagnostakis, K.G., Sidiroglou, S., Akritidis, P., Xinidis, K., Markatos, E.P., Keromytis, A.D.: Detecting targeted attacks using shadow honeypots. In: USENIX (2005)
-  Aziz, B.: Modelling and refinement of forensic data acquisition specifications. Digital Investigation 11(2), 90–101 (2014)
-  Beebe, N.: Digital forensic research: The good, the bad and the unaddressed. In: Advances in Digital Forensics V - Fifth IFIP WG 11.9 International Conference on Digital Forensics. pp. 17–36 (2009)
-  Brilliant Maps: Who Americans Consider Their Allies, Friends and Enemies. https://brilliantmaps.com/us-allies-enemies/ (2017)
-  d. C. Nassif, L.F., Hruschka, E.R.: Document clustering for forensic analysis: an approach for improving computer inspection. IEEE Trans. Inf. Forensic Secur. 8(1), 46–54 (2013)
-  Carrier, B.: Defining digital forensic examination and analysis tools using abstraction layers. International Journal of Digital Evidence 1(4), 1–12 (2003)
-  Dung, P.M.: On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artif. Intell. 77(2), 321–358 (1995)
-  Felmlee, D., Sprecher, S.: Close relationships and social psychology: Intersections and future paths. Social Psychology Quarterly 63, 365–376 (2000)
-  FireEye: Advanced Persistent Threat Groups. https://www.fireeye.com/current-threats/apt-groups.html
-  Fontani, M., Bianchi, T., Rosa, A.D., Piva, A., Barni, M.: A framework for decision fusion in image forensics based on dempster?shafer theory of evidence. IEEE Transactions on Information Forensics and Security 8(4), 593–607 (2013)
-  Goldman, D.: Major banks hit with biggest cyberattacks in history (2012), http://money.cnn.com/2012/09/27/technology/bank-cyberattacks/index.html
-  Goutam, R.K.: The problem of attribution in cyber security. Intern. J. of Computer Applications, Foundation of computer science 131(7), 34–36 (2015)
-  Hu, D., Zhang, X., Fan, Y., Zhao, Z.Q., Wang, L., Wu, X., Wu, X.: On digital image trustworthiness. Applied Soft Computing 48, 240 – 253 (2016)
-  International Telecommunication Union: Global Cybersecurity Index (GCI) 2017. https://www.itu.int/dms_pub/itu-d/opb/str/D-STR-GCI.01-2017-PDF-E.pdf (2017)
-  Kakas, A., Moraitis, P.: Argumentation based decision making for autonomous agents. In: AAMAS ’03. pp. 883–890 (2003)
-  Kakas, A.C., Kowalski, R.A., Toni, F.: Abductive logic programming. J. Log. Comput. 2(6), 719–770 (1992)
-  Kakas, A.C., Mancarella, P., Dung, P.M.: The acceptability semantics for logic programs. In: Intern. Conference on Logic Programming. pp. 504–519 (1994)
-  Karafili, E., Cristani, M., Viganò, L.: A formal approach to analyzing cyber-forensics evidence. In: ESORICS (1). Lecture Notes in Computer Science, vol. 11098, pp. 281–301. Springer (2018)
-  Karafili, E., Wang, L., Kakas, A.C., Lupu, E.: Helping forensic analysts to attribute cyber-attacks: An argumentation-based reasoner. In: PRIMA. vol. 11224, pp. 510–518. Springer (2018)
-  Kaspersky Lab Global Research and Analyst Team: Gauss: Abnormal Distribution. Tech. rep., Kaspersky Lab (2012), https://kasperskycontenthub.com/wp-content/uploads/sites/43/vlpdfs/kaspersky-lab-gauss.pdf
-  Kent, K., Chevalier, S., Grance, T., Dang, H.: SP 800-86. Guide to Integrating Forensic Techniques into Incident Response. Tech. rep., NIST (2006)
-  Mandiant: Exposing One of China’s Cyber Espionage Units. Tech. rep., Mandiant (2013)
-  Matin, S.: 8 Active APT Groups To Watch. https://www.darkreading.com/endpoint/8-active-apt-groups-to-watch/d/d-id/1325161 (2016)
-  Morgan, S.: Top 5 cybersecurity facts, figures and statistics for 2017. https://www.csoonline.com/article/3153707/security/top-5-cybersecurity-facts-figures-and-statistics-for-2017.html
-  National Audit Office: Investigation: WannaCry cyber attack and the NHS (2017)
-  Newman, L.H.: The Biggest Cybersecurity Disasters of 2017 So Far. https://www.wired.com/story/2017-biggest-hacks-so-far/ (2017)
-  Nunes, E., Shakarian, P., Simari, G.I.: Toward argumentation-based cyber attribution. In: AAAI Workshops (2016)
-  Ouerdane, W., Maudet, N., Tsoukias, A.: Argumentation theory and decision aiding. In: Trends in Multiple Criteria Decision Analysis, pp. 177–208. Springer (2010)
-  Rid, T., Buchanan, B.: Attributing Cyber Attacks. Journal of Strategic Studies 38(1-2), 4–37 (2015)
-  Roman, J.: FBI Defends Sony Hack Attribution. https://www.bankinfosecurity.com/sony-a-7762 (2015)
-  Sattler, J.: What we’ve learned from 10 years of the Conficker mystery (2019), https://blog.f-secure.com/what-weve-learned-from-10-years-of-the-conficker-mystery/
-  Schatz, B.L.: Wirespeed: Extending the aff4 forensic container format for scalable acquisition and live analysis. Digital Investigation 14, S45 – S54 (2015)
-  Shakarian, P., Simari, G.I., Moores, G., Paulo, D., Parsons, S., Falappa, M.A., Aleali, A.: Belief revision in structured probabilistic argumentation - model and application to cyber security. Ann. Math. Artif. Intell. 78(3-4), 259–301 (2016)
-  Vidas, T., Kaplan, B., Geiger, M.: OpenLV: Empowering investigators and first-responders in the digital forensics process. Digital Investigation 11, S45 – S53 (2014)
-  Wheeler, D.A., Larsen, G.N.: Techniques for cyber attack attribution. Tech. rep., Institute for Defense Analyses Alexandria VA (2003)
-  YouGov: America’s Friends and Enemies. https://today.yougov.com/topics/politics/articles-reports/2017/02/02/americas-friends-and-enemies (2017)
-  Zetter, K.: Countdown to Zero Day : Stuxnet and the launch of the world’s first digital weapon. Crown Publishing Group, New York City, USA (2014)