1. Introduction
Defending networks and information assets from attack in a constantly evolving threat landscape remains a substantial challenge in our modern and connected world. As better detection and response methods are developed, attackers invariably adapt their tools and techniques to remain competitive. One adaptation is the use of advanced automation to perform attack sequences so quickly they cannot be responded to by defenders. An example of this is the NotPetya malware from 2017. The malware spread very rapidly by credential dumping and lateral movement techniques usually expected of human-on-keyboard attacks (see e.g. (Greenberg, 2018)).
Partial or even full automation of attacks is nothing new. Many exploits and techniques consist of discrete steps, and they can be easily scripted. Frameworks such as Metasploit, OpenVAS, Cobalt Strike, and PowerShell Empire support and automate red teaming activities. However, this type of automation relies on several assumptions about the target environment and often requires a human to configure it for a specific scenario. Moreover, the highly predictable sequences of observable events generated by automated attacks give a further advantage to defenders. Hence, these kinds of attacks can often be detected and responded to efficiently. In contrast, a human expert might be capable of determining the optimal approach for a given target almost immediately and avoid detection, thanks to the experience that they have gathered.
Can the human approach be emulated by intelligent machine learning agents that learn from experience, instead of the user having to enumerate all possibilities and creating logic trees to account for all the alternatives? There would be many potential use cases for an intelligent machine learning red teaming agent. To train comprehensive defensive systems using machine learning, enormous amounts of data of realistic attacks might be necessary. An intelligent agent enables the generation of large amounts of attack data, on-demand, that could be used to train or refine detection models. Moreover, in order to understand how to defend against attacks and perform risk assessment, the potential behavior and threats posed by these machine learning-based agents must be understood by the cyber security community.
Despite a fear of malicious actors using reinforcement learning for offensive purposes and the significant advances in deep reinforcement learning during the past few years, few studies on red teaming with reinforcement learning have been published. The most likely reason for this is that the problem of learning to perform an attack is extremely hard:
-
a complete attack typically consists of a long sequence of interdependent steps;
-
the action space is practically infinite if the actions are the commands that the agent can execute;
-
even formalizing red teaming activities as machine learning problems can be extremely challenging.
Therefore, existing research has focused on automating smaller sub-tasks such as initial access (Takaesu, 2018) or lateral movement during post-exploitation (Maeda and Mimura, 2021).
DeepExploit (Takaesu, 2018) is a deep reinforcement learning agent that is trained to automate gaining initial access using known vulnerabilities and exploits. It is built on the Metasploit framework (Rapid7, 2021). After a successful penetration, it tries to recursively gain access to other hosts in the local network of the given input IP address. DeepExploit is primarily a framework for penetration testing, and its support for post-exploitation activities is very limited: the agent treats lateral movement as a second initial access task.
The deep RL agent of Maeda and Mimura (Maeda and Mimura, 2021) is a step forward in emulating adversarial behavior in real environments: it is trained to perform lateral movement in Windows domains. The authors train the agent using the modules of PowerShell Empire as the action space. The state of the proposed agent consists of ten entries, such as the number of discovered computers in the network, the number of compromised computers, and whether the agent has local administrative privileges. The authors demonstrate that the reinforcement learning agent can learn to perform lateral movement and obtain domain controller privileges.
In this work, we present one potential use case for an intelligent machine learning red teaming agent: we use deep RL to automate the task of local privilege escalation. Privilege escalation is the typical first step performed by an attacker after gaining initial access, and it is often followed by lateral movement to other hosts in the penetrated network. We consider privilege escalation in Windows 7 environments, which may have an arbitrary number of system components (such as services, DLLs, and tasks). We propose a formalization of the privilege escalation task as a reinforcement learning problem and present a novel architecture of an actor-critic RL agent. We experimentally show that the proposed agent can learn to perform the privilege escalation task.
Although we focus on one sub-task performed by a malicious actor, the learning problem that we consider is significantly harder compared to previous works (Takaesu, 2018; Maeda and Mimura, 2021):
-
Privilege escalation needs a sequence of actions, the selection of which depends on the changing system state. The scenario of (Takaesu, 2018), for example, has one-step solutions without any changes in the system state.
-
Privilege escalation can be accomplished by multiple different strategies.
-
The attacked system can have a varying number of system components (services, DLLs, tasks), and the agent should generalize to any number of those.
-
Our training setup is more realistic and diverse compared to (Maeda and Mimura, 2021). Instead of attacking a system whose variability is implemented by noise in system parameters, we use different system configurations in each training episode.
Thus, our study takes a step towards solving more complex red teaming tasks with artificial intelligence.
2. Related Work
Applying reinforcement learning in cyber security has been a subject of much recent research (Nguyen and Reddi, 2019). Examples of application areas include, among others, anti-jamming communication systems (Han et al., 2017), spoofing detection in wireless networks (Xiao et al., 2015), phishing detection (Chatterjee and Namin, 2019), autonomous cyber defense in software-defined networking (Han et al., 2018), mobile cloud offloading for malware detection (Wan et al., 2017), botnet detection (Alauthman et al., 2020), security in mobile edge caching (Xiao et al., 2018), and security in autonomous vehicle systems (Ferdowsi et al., 2018). In addition, reinforcement learning has been applied to research of physical security, such as grid security (Ni and Paul, 2019), and to green security games (Wang et al., 2019).
Previously, multi-agent reinforcement learning has been applied to cyber security simulations with competing adversarial and defensive agents (Bland et al., 2020; Elderman et al., 2017; He et al., 2016). It has been shown that both the attacking and the defending reinforcement learning agents can learn to improve their performance. The success of multi-agent reinforcement learning might have wider implications for information security research even though these simulation-based studies are not directly applicable to real environments.
There have also been attempts to apply reinforcement learning to penetration testing (Takaesu, 2018; Ghanem and Chen, 2018; Caturano et al., 2021; Chowdhary et al., 2020; Zennaro and Erdodi, 2021; Ghanem and Chen, 2020). The results of these efforts suggest that reinforcement learning can support the human in charge of the penetration testing process (Ghanem and Chen, 2018; Caturano et al., 2021). Reinforcement learning has also been applied to planning the steps following the initial penetration by learning a policy in a simulated version of the environment (Chowdhary et al., 2020). Penetration testing (Zennaro and Erdodi, 2021) and web hacking (Erdődi and Zennaro, 2021) have also been converted to simulated capture-the-flag challenges that can be solved with reinforcement learning. Finally, reinforcement learning has been applied to attacking static Portable Executable (PE) (Anderson et al., 2018)
and even supervised learning-based anti-malware engines
(Fang et al., 2019b). The trained agents are capable of modifying the malware to evade detection.Reinforcement learning has also been applied to blue teaming. Microsoft has developed a research toolkit called CyberBattleSim, which enables modeling the behavior of autonomous agents in a high-level abstraction of a computer network (Team, 2021)
. Reinforcement learning agents that operate in the abstracted network can be trained using the framework. The objective of the platform is to create an understanding of how malicious reinforcement learning agents could behave in a network and how reinforcement learning can be used for threat detection. Deep reinforcement learning can also be applied to improving feature selection for malware detection
(Fang et al., 2019a).Non-learning-based approaches to automating adversary emulation have been developed as well. Caldera is a framework capable of automatically planning adversarial actions against Windows enterprise networks. Caldera uses an inbuilt model of the structure of enterprise domains and knowledge about the objectives and potential actions of an attacker. Then, an intelligent heuristics-based planner decides which adversarial actions to perform
(Applebaum et al., 2016). Moreover, several different non-RL-based AI approaches to penetration testing and vulnerability analysis have been proposed (McKinnel et al., 2019).Supervised and unsupervised learning with and without neural networks have been applied for blue teaming purposes. For instance, malicious PowerShell commands can be detected with novel deep learning methods
(Hendler et al., 2018). An ensemble detector combining an NLP-based classifier with a CNN-based classifier was the best at detecting malicious commands. The detection performance was high enough to be useful in practice. The detector was evaluated using a large dataset consisting of legitimate commands executed by standard users, malicious commands executed by malware, and malicious commands designed by security experts. The suitability of machine learning for intrusion detection, malicious code detection, malware analysis, and spam detection has been discussed
(Apruzzese et al., 2018; Cui et al., 2018; Kim et al., 2018). These methods often rely on extensive feature engineering (Kim et al., 2018; Çavuşoğlu, 2019). In defensive tasks, good performance can often be achieved without deep neural networks, with solutions like logistic regression, support vector machines, and random forests
(Milosevic et al., 2017). Machine learning-based systems are vulnerable to adversarial attacks (Chen et al., 2019), and as different ML-based techniques have dissimilar weaknesses, a combination of machine learning techniques is often necessary (Apruzzese et al., 2018; Çavuşoğlu, 2019).3. Reinforcement learning
Reinforcement learning is one of the three main paradigms of machine learning alongside supervised and unsupervised learning (Sutton and Barto, 2018). In reinforcement learning, an agent interacts with an environment over discrete time steps to maximize its long-run reward. At a given time step , the environment has state and the agent is given an observation and a reward signal . If the environment is fully observable, the observation is equal to the environment state, . In a more general scenario, the agent receives only a partial observation which does not represent the full environment state. In this case, the agent has its own state which might differ from the environment state . The agent selects an action from the set of possible actions and acts in the environment. The environment transitions to a new state and the agent receives a new observation and a new reward . The goal of the agent is to maximize the sum of the collected rewards where is a discount factor used to discount future rewards (Mnih et al., 2016).
In this paper, we use a model-free approach to reinforcement learning in which the agent does not build an explicit model of the environment. The agent selects an action according to a policy function which depends on the agent state . We use an algorithm called the advantage actor-critic (A2C) (Mnih et al., 2016) in which the policy is parameterized as . The parameters of the policy are updated in the direction of
(1) |
where
is an advantage function which estimates the (relative) benefit of taking action
in state in terms of the expected total reward. In A2C, the advantage function is computed aswhere is the state-value function which estimates the expected total reward when the agent starts at state and follows policy :
We update the parameters of the value function using Monte Carlo estimates of the total discounted rewards as the targets
using the Huber loss (Huber, 1992). In practice, most of the parameters and are shared (see Section 5.2 and Figure 1).
4. Privilege escalation as a reinforcement learning task
4.1. Problem Definition
In this work, we focus on automating one particular step often performed by red teaming actors: local privilege escalation in Windows. For our reinforcement learning agent, there will be three possible paths to success:
-
Add the current user as a local administrator
-
Obtain administrative credentials
-
Overwrite a program that is executed with elevated privileges when a user or an administrator logs on
The first alternative is hardly how a true red teaming actor would approach the problem as changes in the local administrators of a workstation are easily detectable by any advanced detection and response system. However, if the agent is successful at doing that, it demonstrates that the agent can, with some exceptions, execute arbitrary code with elevated privileges on the victim host. The second alternative is a more realistic alternative for performing local privilege escalation. The third method is arguably inferior to the other two methods as it requires the attacker to wait for the system to be rebooted or some other event to occur that triggers the scheduled task or the AutoRun.
4.2. Learning Environment
The learning environment is a simulated Windows 7 environment with a random non-zero number of services, tasks, and AutoRuns. In each training episode, we introduce one vulnerability in the simulated system by selecting randomly from the following 12 alternatives:
-
hijackable DLL
-
missing DLL
-
writable DLL
-
-
re-configurable service
-
unquoted service path
-
modifiable ImagePath in the service registry
-
writable executable pointed to by a service
-
missing service binary and a writable service folder
-
writable binary pointed to by an AutoRun
-
alwaysInstallElevated bits set to one
-
credentials of a user with elevated access in the WinLogon registry
-
credentials of a user with elevated access in an Unattend file
-
writable binary pointed to by a scheduled task running with elevated privileges
-
writable Startup folder
To increase the variability of the environment states, we also randomly add services, tasks, and AutoRuns that might initially seem vulnerable to the agent. For instance, a service with one of the service-specific vulnerabilities above but without elevated privileges or a service with a writable parent folder but without an unquoted path can be added. Moreover, standard user credentials might be added to the registry, or a folder on the Windows path might be made writable, among others.
To train an autonomous reinforcement learning agent to perform local privilege escalation, we need to formalize the learning problem, that is, we need to define the reward function , the action space , and the space of the agent states.
Defining the reward function is perhaps the easiest task. We selected the simplest possible reward structure without any reward shaping. The agent is given a reward for the final action of the episode if the privilege escalation has been performed successfully. Otherwise, a zero reward is given. Based on our experiments, this simple sparse reward signal is sufficient for teaching the agent to perform privilege escalation with as few actions as possible because the reward is progressively discounted as more steps are taken by the agent. We also experimented by giving the agent only half a reward () for performing privilege escalation by the third, arguably inferior method, but the agent had trouble learning the desired behavior.
The state of the environment and its dynamics are determined by the Windows 7 environment (or its simulator) that the agent interacts with. The environment is only partially observable: the observations are the outputs of the commands that the agent executes. Working with such a rich observation space is difficult, and therefore, we have designed a custom procedure that converts the observations into the agent state . It is the agent state that is used as the input of the policy and value functions. We also manually designed a set of high-level actions that the agent needs to choose from. We describe the agent state and the action space in the following sections.
Trinary variables: |
(1) Are there credentials in files? |
(2) Do the credentials in the files belong to users with elevated |
privileges? |
(3) Are there credentials in the registry? |
(4) Do the credentials in the registry belong to users with |
elevated privileges? |
(5) Is there a writable folder on the Windows path? |
(6) Are the AlwaysInstallElevated bits set? |
(7) Can the AutoRuns be enumerated using an external |
PowerShell module? |
Binary variables: |
Has a malicious executable been created in Kali Linux? |
Has a malicious service executable been created in Kali Linux? |
Has a malicious DLL been created in Kali Linux? |
Has a malicious MSI file been created in Kali Linux |
Has a malicious executable been downloaded? |
Has a malicious service executable been downloaded? |
Has a malicious DLL been downloaded? |
Has a malicious MSI file been downloaded? |
Does the agent know the list of local users? |
Are there users whose privileges need to be checked? |
Does the agent know the services running on the OS? |
Does the agent know the scheduled tasks running on the OS? |
Does the agent know the AutoRuns of the OS? |
Has the agent performed a static analysis of the service |
binaries to detect DLLs? |
Have the DLLs loaded by the service binaries been searched? |
Are there folders whose permissions must be checked? |
Are there executables whose permissions must be checked? |
Does the agent know the current username? |
Does the agent know the Windows path? |
Are there base64-credentials to decode? |
4.3. State of the Agent
We update the state of the agent by keeping the information that is relevant for the task of privilege escalation. The agent state includes variables that contain general information about the system, information about discovered services, dynamic-link libraries (DLLs), AutoRun registry, and scheduled tasks.
The general information is represented by the 27 variables listed in Table 1. Seven of these variables are trinary (true/unknown/false) and contain information useful for the task of privilege escalation. The remaining 20 variables are binary (true/false), and they also contain information about the previous actions of the agent. The previous actions are included in the state to make the agent state as close to Markov as possible, which makes the training easier.
Service | Is the service running? |
---|---|
Is the service run with elevated privileges? | |
Is the service path unquoted? | |
Is there a writable parent folder? | |
Is there whitespace in the service path? | |
Is the service binary in C:\Windows? | |
Can the service executable be written? | |
Can the service be re-configured? | |
Can the service registry be modified? | |
Does the service load a vulnerable DLL? | |
Has the service been exploited? | |
DLL | Is the DLL missing? |
Is the DLL writable? | |
Has the DLL been replaced with a malicious DLL? | |
AutoRun | Is the AutoRun file writable? |
Is the AutoRun file in C:\Windows? | |
Task | Is the task run with elevated privileges? |
Is the executable writable? | |
Is the executable in C:\Windows? | |
Service | Name, executable path, user |
---|---|
Executable | Path, linked DLLs |
DLL | Name, executable calling, path |
AutoRun | Executable path, trigger |
Task | Name, executable path, trigger, user |
Credentials | Username, password, plaintext |
File system | Folders, executables, permissions |
At the beginning of each training episode, the agent has no knowledge of the services running on the host. The agent has to collect a list of services by taking the action A31 Get a list of services. Once a service is detected, it is described by its name, full path, the owning user and the 11 trinary attributes listed in Table 2. Each of these attributes can have three possible values: true (+1), unknown (0), and false (-1). Then, the agent needs to perform actions such as A25 Check service permissions with accesschk64 to fill the values of the unknown attributes.
A1. Create a malicious executable in Kali Linux | A19. Change service registry to point to a malicious executable |
---|---|
A2. Create a malicious service executable in Kali Linux | A20. Change service registry to add the user to local administrators |
A3. Compile a custom malicious DLL in Kali Linux | A21. Install a malicious MSI file |
A4. Create a malicious MSI in Kali Linux | A22. Search for unattend* sysprep* unattended* files |
A5. Download a malicious executable in Windows | A23. Decode base64 credentials |
A6. Download a malicious service executable in Windows | A24. Test credentials |
A7. Download a malicious DLL in Windows | A25. Check service permissions with accesschk64 |
A8. Download a malicious MSI in Windows | A26. Check the ACLs of the service registry with Get-ACL |
A9. Start an exploited service | A27. Check executable permissions with icacls |
A10. Stop an exploited service | A28. Check directory permissions with icacls |
A11. Overwrite the executable of an autorun | A29. Analyze service executables for DLLs |
A12. Overwrite the executable of a scheduled task | A30. Search for DLLs |
A13. Overwrite a service binary | A31. Get a list of services |
A14. Move a malicious executable so that it is executed by | A32. Get a list of AutoRuns |
an unquoted service path | A33. Get a list of scheduled tasks |
A15. Overwrite a DLL | A34. Check AlwaysInstallElevated bits |
A16. Move a malicious DLL to a folder on Windows path | A35. Check for passwords in Winlogon registry |
to replace a missing DLL | A36. Get a list of local users and administrators |
A17. Re-configure service to use a malicious executable | A37. Get the current user |
A18. Re-configure service to add the user to local administrators | A38. Get the Windows path |
Since local privilege escalation can be performed by DLL hijacking, we also include the information about the DLLs used by the services in the state. Each DLL is described using a set of attributes listed in Table 2. This information is added to the state after taking action A29 Analyze service executables for DLLs.
Privileges can be elevated in Windows by using vulnerable executables in the AutoRun registry and misconfigured scheduled tasks. Therefore, we add information about the AutoRun files and the scheduled tasks to the agent state. Each AutoRun file and each scheduled task is described using the trinary attributes defined in Table 2.
In addition to the variables defined in Table 1 and Table 2, the agent maintains a collection of auxiliary information in its memory. The information is needed to fill the arguments of the commands executed by the agent. Examples of the auxiliary information are given in Table 3. This information is gathered and updated based on the observations, that is, the outputs of the commands performed by the agent. The auxiliary information is not given as input to the neural network, and hence, it affects neither the policy nor the value directly.
4.4. Action Space
We designed the action space of the agent by including actions needed for gathering information about the victim Windows host and performing the privilege escalation techniques. The action space consists of 38 actions listed in Table 4. Although the action space is crafted for known privilege escalation vulnerabilities (which we consider unavoidable within the constraints of the current RL), there is no one-to-one relationship between the actions and vulnerabilities. Some actions are only relevant for specific vulnerabilities, whereas many others are more general and can be used in multiple scenarios (see Appendix B). Our general principle in constructing the action space has been to make the actions as atomic as possible while keeping the problem potentially solvable by the current RL.
The actions are defined on a high level, which means that their exact implementation can vary, for example, depending on the platform. For instance, action A29 Analyze service executables for DLLs
can be implemented by static analysis of the Portable Executable files with an open-source analyzer to detect the loaded DLLs. The same action can be implemented using a custom analyzer or a script to download the executable and analyze it with Process Monitor. Our high-level action definition enables modifying the low-level implementations of the actions, such as changing the frameworks used, without affecting the trained agent. To create the necessary malicious executables, we use Kali Linux with Metasploit. The malicious DLLs needed for performing DLL hijacking are compiled manually. However, the low-level implementation of these commands can easily be changed if desired.
Each of the high-level actions is well-specified and can be performed using only a handful of standard Windows (cmd.exe and PowerShell) and Linux (zsh) commands. Many of the commands need arguments. For instance, to take the action A9 Start an exploited service, the name of the service must be specified. In this work, we automatically fill the arguments using the auxiliary information collected as discussed in Section 4.3. For example, one of the actions defined is A28 Check directory permissions with icacls. The agent maintains an internal list of directories that are of interest, and when the action to analyze the permissions of directories is selected, every directory on the list is scanned.
5. Experiments
5.1. Simulator of a Windows 7 Virtual Machine
A key practical challenge for training a reinforcement learning agent to perform red teaming tasks is the slow simulation speed when performing actions on a real virtual machine. For example, running commands necessary for privilege escalation can take longer than a minute on a full-featured Windows 7 virtual machine, even if the agent acts optimally. At the beginning of training, when the agent selects actions very close to randomly, one episode of training on a real VM can last significantly longer. Moreover, each training episode requires a new virtual machine that has been configured with one of the available vulnerabilities. Provisioning and configuring a virtual machine in such a manner will further add to the time it would take to train the agent. Training a successful agent may require thousands of training episodes, which can take a prohibitively large amount of time when training on a real operating system. Developing an infrastructure to tackle the long simulation times on a real system is a significant challenge, and it is left outside the scope of this study.
To alleviate this issue, we implemented a simulated Python environment that emulates the behavior of a genuine Windows 7 operating system relevant to the privilege escalation task. The simulation consists of, among others, a file system with access controls, Windows registry, AutoRuns, scheduled tasks, users, executables, and services. Using this environment, the actions taken by the agent can be simulated in a highly efficient manner. Moreover, creating simulated machines with random vulnerabilities for training requires little programming and computing power and can be done very fast. However, to determine whether training the agent in a simulated environment instead of a real operating system is feasible, the trained agent will be evaluated by testing it on a vulnerable Windows 7 virtual machine.
The architecture of the A2C agent. Colored boxes represent multilayer perceptrons (same colors denote shared parameters). The black circles represent the concatenation of input signals.
The neural networks takes the autoruns, services and tasks as input and outputs the policy and the value
5.2. The Architecture of the Agent
We use an A2C reinforcement learning agent (described in Section 3) in which the policy and value functions are modeled with neural networks. In practice, we use a single neural network with two heads: one produces the estimated value of the state
and the other the probabilities
of taking one of the 38 actions. The network gets as inputs the state variables described in Tables 1 and 2. The complete model has less than 27,000 parameters.The main challenge in designing the neural network is that the number of AutoRuns, services, tasks, and DLLs can vary across training episodes. A Windows host might have anything from dozens to thousands of services, and the number of tasks and AutoRuns might also vary significantly depending on the host. We want our agent to be able to generalize to any number of those. Therefore, we process the information about each service, AutoRun, and scheduled task separately and aggregate the outputs of these computational blocks using the maximum operation. The architecture of the neural network is presented in Figure 1. Computational blocks with shared parameters are shown with the same colors.
In the proposed architecture, we concatenate the max-aggregated outputs of the blocks that process the information about AutoRuns, tasks, and DLLs with the outputs of the blocks that process the information about the individual services. Intuitively, this corresponds to augmenting the service data with the information about the most vulnerable AutoRun and task. Note that we also augment the service data with the information about the DLLs used by the corresponding service. Then, we pass the concatenated information through multilayer perceptrons that output value estimates and policies for all services. Finally, we regard the service with the highest value estimate as the most vulnerable one and select the policy corresponding to that service as the final policy.
5.3. Training
Training consists of episodes in which the agent interacts with one instance of a sampled environment. At the beginning of each episode, the agent has no knowledge of the environment. The empty agent state is fed into the neural network that produces the value and the policy outputs. The action is sampled from the probabilities given by the policy output. Thus, we do not use explicit exploration strategies such as epsilon-greedy. The selected action is performed in the simulated environment, and the reward and the observations received as a result of the action are passed to the agent. The observations are parsed to update the agent state as described in Section 4.3. Then, a new action is selected based on the updated state. The iteration continues until the maximum number of steps for one episode is reached, or the agent has successfully performed privilege escalation.
The parameters of the neural network are updated at the end of each episode. The gradient (1
) is computed automatically by backpropagation with PyTorch
(Paszke et al., 2019) and the parameters are updated using the Adam optimizer (Kingma and Ba, 2014). The agent is trained for as long as the average reward per episode continues to increase. The hyperparameters used for training the agent are given in Appendix
A.Figure 2 presents the evolution of the episode length (averaged over 100 episodes) during one training run. The average episode length starts from around 200, and it gradually decreases reaching a level slightly above 11 after approximately 30,000 training episodes. We use 1,000 as the maximum number of steps per episode (see Appendix A), which implies that the agent manages to solve the problem and gets rewards from the very beginning of the optimization procedure when it takes close-to-random actions. We estimated that the average episode length is approximately 10.7 actions if the agent acts according to the optimal policy. Thus, the results indicate that the agent has learned to master the task of privilege escalation.

The average episode length starts from around 200 and it decreases during the training until it reaches a level slightly above 11 after approximately 30,000 training episodes.
Training the agent for 50,000 episodes in the simulated environment (see Section 5.1) takes less than two hours without any significant code optimizations using a single NVIDIA GeForce GTX 1080 Ti GPU, which is a high-end consumer GPU from 2017. Note that performing the same training on a real Windows 7 virtual machine could take weeks.
5.4. Testing the Agent
Next, we test whether the agent trained in our simulated environment can transfer to a real Windows 7 operating system without any adaptation. We also compare the performance of our agent to two baselines: a ruleset crafted by an expert that can be interpreted as the optimal policy and a random policy.
We create a Windows 7 virtual machine using Hyper-V provided by the Windows 10 operating system. We assume that the offensive actor has gained low-level access with code execution rights by performing, for example, a successful penetration test or a phishing campaign. This is simulated by installing an SSH server on the victim host. We use the Paramiko SSH library in Python to connect to the virtual machine and execute commands with user-level credentials (Forcier, 2021). We use Hyper-V to create a Kali Linux virtual machine with Metasploit for generating malicious executables. However, instead of using Metasploit for creating malicious DLLs, the agent has to modify and compile a custom DLL code by taking action A3 Compile a custom malicious DLL in Kali Linux. Paramiko is also used to connect to the Kali machine.
Using SSH for simulating low-level code execution rights on the victim Windows 7 has some limitations. Some of the Windows command-line utilities such as wmic and accesschk64 are blocked by non-privileged users over SSH. To overcome this limitation of the test scenario, we open a second SSH session using elevated credentials and run the blocked commands in that session. In practice, a malicious actor would be able to execute these utilities while accessing the victim’s environment via reverse shell or meterpreter session. Care was taken to prevent the test infrastructure from affecting the target environment. For example, due to the selection of an SSH tunnel as an off-the-shelf communication channel for testing purposes, the agent does not target any SSH-related vulnerabilities. Engineering effect was not prioritized for creating a production-ready attack agent, as it was considered beyond the scope of the research.
In order to take some of the actions listed in Table 4, the victim host has to have Windows Sysinternals with accesschk64. Moreover, we need an executable for scanning the DLLs loaded by the PE files. We used an open-source solution for that, but it failed to detect a DLL loaded by a handcrafted service executable. To work around this issue, we hard-coded the result of the scan in the agent. To properly address this issue, the high-level action of performing PE scanning could be mapped to a script that uploads the service executable on a Windows machine and uses ProcMon from Sysinternals to analyze the DLLs loaded by the service executable. Alternatively, a superior PE analyzer could be used.
First, we tested our agent on a virtual machine without external antivirus (AV) software or an intrusion detection system, but which had an up-to-date and active Windows Defender (which is essentially only an anti-spyware program in Windows 7). We kept the number of services similar to the number of services during training by excluding all services in C:\Windows from the list of services gathered by the agent. We made the agent deterministically select the action with the highest probability. Our agent was successful in exploiting all the twelve vulnerabilities. Examples of the sequences of actions taken by the agent during evaluation can be found in Tables 5–8. The agent took very few unnecessary actions. The performance (measured in terms of the number of actions) could be improved by gathering more information before scanning for directory permissions. Now, the agent prefers scanning the directory permissions immediately after finding interesting directories. However, the amount of noise generated by the agent would have been similar as the agent would have performed more Windows commands per high-level action.
A35. Check for passwords in Winlogon registry |
A37. Get the current user |
A31. Get a list of services |
A28. Check directory permissions with icacls |
A36. Get a list of local users and administrators |
A26. Check the ACLs of service registries with Get-ACL |
A25. Check service permissions with accesschk64 |
A34. Check AlwaysInstallElevated bits |
A32. Get a list of AutoRuns |
A28. Check directory permissions with icacls |
A27. Check executable permissions with icacls |
A22. Search for unattend* sysprep* unattended* files |
A33. Get a list of scheduled tasks |
A28. Check directory permissions with icacls |
A27. Check executable permissions with icacls |
A29. Analyze service executables for DLLs |
A30. Search for DLLs |
A28. Check directory permissions with icacls |
A38. Get the Windows path |
A28. Check directory permissions with icacls |
A3. Compile a custom malicious DLL in Kali Linux |
A7. Download a malicious DLL in Windows |
A16. Move a malicious DLL to a folder on Windows path |
to replace a missing DLL |
A9. Start an exploited service |
A35. Check for passwords in Winlogon registry |
A37. Get the current user |
A31. Get a list of services |
A28. Check directory permissions with icacls |
A2. Create a malicious service executable in Kali Linux |
A6. Download a malicious service executable in Windows |
A13. Overwrite a service binary |
A9. Start an exploited service |
A35. Check for passwords in Winlogon registry |
A36. Get a list of local users and administrators |
A24. Test credentials |
A35. Check for passwords in Winlogon registry |
A37. Get the current user |
A31. Get a list of services |
A28. Check directory permissions with icacls |
A36. Get a list of local users and administrators |
A26. Check the ACLs of service registries with Get-ACL |
A25. Check service permissions with accesschk64 |
A34. Check AlwaysInstallElevated bits |
A4. Create a malicious MSI file in Kali Linux |
A8. Download a malicious MSI file in Windows |
A21. Install a malicious MSI file |
After that, we did not limit the number of services (by excluding services in C:\Windows) and let the agent perform privilege escalation. The increased number of services had no negative effect on the agent, and the agent was successful at the task. An example sequence of commands is given in Appendix C. However, because the agent performs each selected action on every applicable service, the agent generates some noise by scanning through the permissions of all services in C:\Windows. That could have caused an alert in an advanced detection and response system.
The number of actions used by the agent to escalate privileges during the testing phase is given in Table 9. We compare the following agents:
-
the oracle agent, which assumes complete knowledge of the system, including the vulnerability;
-
the optimal policy, which is approximated using a fixed ruleset crafted by an expert;
-
the deterministic RL agent, which selects the action with the highest probability;
-
the stochastic RL agent, which samples the action from the probabilities produced by the policy network;
-
an agent taking random actions.
For all random trials, we used 1000 samples and computed the average number of actions used by the agent. Because of the computational cost of running thousands of episodes, all tests involving randomness were run in a simulated environment similar to the testing VM. The results suggest that the policy of the deterministic agent is close to optimal. The addition of stochasticity to action selection has a slightly negative effect on the performance but it increases the variability of the agent’s actions making the agent potentially more difficult to detect.
Vulnerability | Oracle | Expert | Deterministic | Stochastic | Random |
(full knowledge) | ( optimal policy) | RL | RL | ||
1 | 10 | 20 | 24 | 25.3 | 231.2 |
2 | 5 | 10 | 9 | 8.0 | 152.2 |
3 | 7 | 7 | 8 | 8.2 | 206.0 |
4 | 5 | 9 | 8 | 9.0 | 147.7 |
5 | 7 | 10 | 15 | 11.5 | 212.2 |
6 | 7 | 7 | 8 | 8.3 | 208.1 |
7 | 6 | 9 | 14 | 14.1 | 171.7 |
8 | 5 | 15 | 11 | 11.0 | 156.0 |
9 | 3 | 11 | 3 | 10.9 | 96.0 |
10 | 4 | 13 | 14 | 13.5 | 162.8 |
11 | 6 | 9 | 17 | 17.6 | 166.9 |
12 | 6 | 8 | 13 | 13.3 | 165.0 |
AVG | 5.9 | 10.7 | 12.0 | 12.6 | 173.0 |
We additionally tested the ability of the agent to generalize to multiple vulnerabilities which might be simultaneously present in the system. This was done in three ways. First, the agent was evaluated in an environment with six different types of vulnerable services present. Second, the agent was evaluated in an environment with all twelve vulnerabilities present. Finally, random combinations of any two vulnerabilities were tested. The agent had little trouble performing privilege escalation in any of these scenarios.
As a matter of interest, we finally evaluated the agent’s performance against a host running an up-to-date version of a standard endpoint protection software, Microsoft Security Essentials, with real-time protection enabled. As expected, the AV software managed to recognize the default malicious executables created by msfvenom in Kali Linux. However, the AV software failed to recognize the custom DLL compiled by the agent, and hence, privilege escalation using DLL hijacking was possible. Moreover, the AV software failed to detect any methods that did not involve a downloaded malicious payload, such as re-configuring the vulnerable service to execute a command that added the user as a local administrator. Hence, privilege escalation was possible in many of the scenarios, even with up-to-date AV software present. It should be noted that these techniques fall beyond the scope of file-based threat detection used by standard antivirus software and would require more advanced protection strategies to counter, such as behavioral- or heuristics-based detection. The agent’s performance against such detection engines was considered to be beyond the scope of the project and was not assessed.
6. Discussion
Our work demonstrates that it is possible to train a deep reinforcement learning agent to perform local privilege escalation in Windows 7 using known vulnerabilities. Our method is the first reinforcement learning agent, to the best of our knowledge, that performs privilege escalation with an extensive state space and a high-level action space with easily customizable low-level implementations. Despite being trained in simulated environments, the test results demonstrate that our agent can solve the formalized privilege escalation problem in a close-to-optimal fashion on full-featured Windows machines with realistic vulnerabilities.
The efficacy of our implementation is limited if up-to-date antivirus software is running on the victim host because only a handcrafted DLL is used, whereas the malicious executables are created using Metasploit with default settings. However, if the mapping from the high-level actions to the low-level commands (see Table 4) was improved so that more sophisticated payloads were used or the action space was expanded with actions for defense evasion, a reinforcement learning agent could be capable of privilege escalation in hosts with up-to-date antivirus software but without an advanced detection and response system.
While simple attacks are likely to be detected by advanced breach detection solutions, not all companies employ those for various reasons. The constant stream of breaches seen in the news reflects that reality. Moreover, if adversaries develop RL-based tools for learning and automating adversarial actions, they might prefer to target networks that are less likely to be running breach detection software.
The current threat level presented by reinforcement learning agents is most likely limited to agents capable of exploiting existing well-known vulnerabilities. The same could be achieved by a scripted agent with a suitable hard-coded logic. However, the RL approach offers a number of benefits compared to a scripted agent:
-
Scripting an attacking agent can be difficult when the number of potentially exploitable vulnerabilities grows and if the attacked system contains an IDS.
-
The probabilistic approach of our RL agent will produce more varied attacks (and attempted attacks) than a scripted robot that follows hard-coded rules, which makes our agent more usable for training ML-based defenses and testing and evaluating intrusion detection systems.
-
An RL agent may be quickly adapted to changes in the environment. For example, if certain sequences of actions cause an alarm raised by an intrusion detection system, the agent might learn to take a different route, which is not detectable by the IDS. This would produce invaluable information for strengthening the defense system.
In the long run, RL agents could have the potential to discover and exploit novel unseen vulnerabilities, which would have an enormous impact on the field. To implement this idea, agents would most likely need to interact with an authentic environment, which would require a great deal of engineering effort and a huge amount of computational resources. Crafting the action space would nevertheless be most likely unavoidable within the constraints of the current RL methods. However, the development should go in the direction of making the actions more atomic and minimizing the amount of prior knowledge used in designing the action space. This could allow the agent to encompass more vulnerabilities and could be a way to get closer to the ultimate goal of discovering new vulnerabilities.
Another research direction is to increase the complexity of the learning task. In this first step, we wanted to understand how RL-powered attacks could work in a constrained, varied setup, and our key result is showing that the RL approach works for such a complex learning task. Defeating defensive measures or expanding to a wider range of target environments would be a research topic with a significantly larger scope. It would be interesting to see whether a reinforcement learning agent can perform more steps in the cyber security kill chain, such as defense evasion. It would also be interesting to train the agent in an environment with an intrusion detection system or a defensive RL agent and perform multi-agent reinforcement learning, which has been done in previous research on post-exploitation in simulated environments (Elderman et al., 2017).
Ethical Considerations
The primary goal of this work is to contribute to building resistant defense systems that can detect and prevent various types of potential attacks. Rule-based defense systems can be effective, but as the number of attack scenarios grows, they become increasingly difficult to build and maintain. Data-driven defensive systems trained with machine learning offer a promising alternative but implementing this idea in practice is challenged by the problem of scarcity of available training data in this domain. In this work, we present a possible solution to this problem by training a reinforcement learning agent to perform malicious activities and therefore to generate invaluable training data for improving defense systems. The presented approach can also be useful to support the red teaming activities performed by cyber security experts by automating some steps in the kill chain. However, the system developed in this project can potentially be dangerous in the wrong hands. Hence, the code created in this project will not be open-sourced or released to the public.
Acknowledgements.
We thank the Security and Software Engineering Research Center SERC for funding our work. In addition, we thank David Karpuk, Andrew Patel, Paolo Palumbo, Alexey Kirichenko, and Matti Aksela from F-Secure for their help in running this project and their domain expertise, Tuomas Aura for giving us highly valuable feedback, and the Academy of Finland for the support within the Flagship Programme Finnish Center for Artificial Intelligence (FCAI).References
- An efficient reinforcement learning-based botnet detection approach. Journal of Network and Computer Applications 150, pp. 102479. Cited by: §2.
- Learning to evade static pe machine learning malware models via reinforcement learning. arXiv preprint arXiv:1801.08917. Cited by: §2.
- Intelligent, automated red team emulation. In Proceedings of the 32nd Annual Conference on Computer Security Applications, pp. 363–373. Cited by: §2.
- On the effectiveness of machine and deep learning for cyber security. In 2018 10th International Conference on Cyber Conflict (CyCon), pp. 371–390. Cited by: §2.
- Machine learning cyberattack and defense strategies. Computers & Security 92, pp. 101738. Cited by: §2.
- Discovering reflected cross-site scripting vulnerabilities using a multiobjective reinforcement learning environment. Computers & Security 103, pp. 102204. Cited by: §2.
- A new hybrid approach for intrusion detection using machine learning methods. Applied Intelligence 49 (7), pp. 2735–2761. Cited by: §2.
- Detecting phishing websites through deep reinforcement learning. In 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Vol. 2, pp. 227–232. Cited by: §2.
- Adversarial attack and defense in reinforcement learning-from ai security view. Cybersecurity 2 (11), pp. 1–22. Cited by: §2.
- Autonomous security analysis and penetration testing. In 2020 16th International Conference on Mobility, Sensing and Networking (MSN), pp. 508–515. Cited by: §2.
- Detection of malicious code variants based on deep learning. IEEE Transactions on Industrial Informatics 14 (7), pp. 3187–3196. Cited by: §2.
- Adversarial reinforcement learning in a cyber security simulation.. In ICAART (2), pp. 559–566. Cited by: §2, §6.
- The agent web model: modeling web hacking for reinforcement learning. International Journal of Information Security, pp. 1–17. Cited by: §2.
- Feature selection for malware detection based on reinforcement learning. IEEE Access 7, pp. 176177–176187. Cited by: §2.
- Evading anti-malware engines with deep reinforcement learning. IEEE Access 7, pp. 48867–48879. Cited by: §2.
- Robust deep reinforcement learning for security and safety in autonomous vehicle systems. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 307–312. Cited by: §2.
- External Links: Link Cited by: §5.4.
- Reinforcement learning for intelligent penetration testing. In 2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), pp. 185–192. Cited by: §2.
- Reinforcement learning for efficient network penetration testing. Information 11 (1), pp. 6. Cited by: §2.
- The untold story of notpetya, the most devastating cyberattack in history. Wired, August 22. Cited by: §1.
- Two-dimensional anti-jamming communication based on deep reinforcement learning. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2087–2091. Cited by: §2.
-
Reinforcement learning for autonomous defence in software-defined networking.
In
International Conference on Decision and Game Theory for Security
, pp. 145–165. Cited by: §2. - Faster learning and adaptation in security games by exploiting information asymmetry. IEEE Transactions on Signal Processing 64 (13), pp. 3429–3443. Cited by: §2.
- Detecting malicious powershell commands using deep neural networks. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pp. 187–197. Cited by: §2.
- Robust estimation of a location parameter. In Breakthroughs in Statistics, pp. 492–518. Cited by: §3.
- A multimodal deep learning method for android malware detection using various features. IEEE Transactions on Information Forensics and Security 14 (3), pp. 773–788. Cited by: §2.
- Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.3.
- Automating post-exploitation with deep reinforcement learning. Computers & Security 100, pp. 102108. Cited by: 4th item, 5th item, 6th item, §1, §1, §1.
- A systematic literature review and meta-analysis on artificial intelligence in penetration testing and vulnerability assessment. Computers & Electrical Engineering 75, pp. 175–188. Cited by: §2.
- Machine learning aided android malware classification. Computers & Electrical Engineering 61, pp. 266–274. Cited by: §2.
- Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pp. 1928–1937. Cited by: §3, §3.
- Deep reinforcement learning for cyber security. arXiv preprint arXiv:1906.05799. Cited by: §2.
- A multistage game in smart grid security: a reinforcement learning solution. IEEE Transactions on Neural Networks and Learning Systems 30 (9), pp. 2684–2695. Cited by: §2.
- PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), pp. 8024–8035. External Links: Link Cited by: §5.3.
- Metasploit Framework. External Links: Link Cited by: §1.
- Reinforcement learning: an introduction. MIT press. Cited by: §3.
- External Links: Link Cited by: 1st item, 5th item, 6th item, §1, §1, §1, §2.
- External Links: Link Cited by: §2.
- Reinforcement learning based mobile offloading for cloud-based malware detection. In GLOBECOM 2017-2017 IEEE Global Communications Conference, pp. 1–6. Cited by: §2.
- Deep reinforcement learning for green security games with real-time information. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 1401–1408. Cited by: §2.
- Spoofing detection with reinforcement learning in wireless networks. In 2015 IEEE Global Communications Conference (GLOBECOM), pp. 1–5. Cited by: §2.
- Security in mobile edge caching with reinforcement learning. IEEE Wireless Communications 25 (3), pp. 116–122. Cited by: §2.
- Modeling penetration testing with reinforcement learning using capture-the-flag challenges: trade-offs between model-free learning and a priori knowledge. arXiv preprint arXiv:2005.12632. Cited by: §2.
Appendix A Supplementary material for RL
The hyperparameters used for training are listed below:
First moment decay rate for Adam
Parameter
Explanation
Value
Discount rate
0.995
Learning rate for Adam
0.001
Appendix B Actions Required to Exploit the Vulnerabilities
The sequences of actions that can be used to exploit the twelve vulnerabilities are presented in this section. The actions used in multiple scenarios are marked with the blue color.
(1.1) Missing DLL:
A37. Get the current user
A31. Get a list of services
A29. Analyze service executables for DLLs
A30. Search for DLLs
A38. Get the Windows path
A28. Check directory permissions with icacls
A3. Compile a custom malicious DLL in Kali Linux
A7. Download a malicious DLL in Windows
A16. Move a malicious DLL to a folder on Windows path
to replace a missing DLL
A9. Start an exploited service
(1.2) Writable DLL:
A37. Get the current user
A31. Get a list of services
A29. Analyze service executables for DLLs
A30. Search for DLLs
A27. Check executable permissions with icacls
A3. Compile a custom malicious DLL in Kali Linux
A7. Download a malicious DLL in Windows
A15. Overwrite a DLL
A9. Start an exploited service
(2) Re-configurable Service:
A37. Get the current user
A31. Get a list of services
A25. Check service permissions with accesschk64
A18. Re-configure service to add the user to local
administrators
A9. Start an exploited service
(3) Unquoted Service Path:
A37. Get the current user
A31. Get a list of services
A28. Check directory permissions with icacls
A2. Create a malicious service executable in Kali Linux
A6. Download a malicious service executable in Windows
A14. Move a malicious executable so that it is executed by
an unquoted service path
A9. Start an exploited service
(4) Modifiable ImagePath:
A37. Get the current user
A31. Get a list of services
A26. Check the ACLs of the service registry with Get-ACL
A20. Change service registry to add the user to local
administrators
A9. Start an exploited service
(5) Writable Service Executable:
A37. Get the current user
A31. Get a list of services
A27. Check executable permissions with icacls
A2. Create a malicious service executable in Kali Linux
A6. Download a malicious service executable in Windows
A13. Overwrite a service binary
A9. Start an exploited service
(6) Missing Service Executable:
A37. Get the current user
A31. Get a list of services
A28. Check directory permissions with icacls
A2. Create a malicious service executable in Kali Linux
A6. Download a malicious service executable in Windows
A13. Overwrite a service binary
A9. Start an exploited service
(7) Writable AutoRun Executable:
A37. Get the current user
A32. Get a list of AutoRuns
A27. Check executable permissions with icacls
A1. Create a malicious executable in Kali Linux
A5. Download a malicious executable in Windows
A11. Overwrite the executable of an AutoRun
(8) AlwaysInstallElevated:
A37. Get the current user
A34. Check AlwaysInstallElevated bits
A4. Create a malicious MSI in Kali Linux
A8. Download a malicious MSI in Windows
A21. Install a malicious MSI file
(9) WinLogon Registry:
A35. Check for passwords in Winlogon registry
A36. Get a list of local users and administrators
A24. Test credentials
(10) Unattend File:
A22. Search for unattend* sysprep* unattended* files
A23. Decode base64 credentials
A36. Get a list of local users and administrators
A24. Test credentials
(11) Writable Task Binary:
A37. Get the current user
A33. Get a list of scheduled tasks
A27. Check executable permissions with icacls
A1. Create a malicious executable in Kali Linux
A5. Download a malicious executable in Windows
A12. Overwrite the executable of a scheduled task
(12) Writable Startup Folder:
A37. Get the current user
A32. Get a list of AutoRuns
A28. Check directory permissions with icacls
A1. Create a malicious executable in Kali Linux
A5. Download a malicious executable in Windows
A11. Overwrite the executable of an AutoRun
Appendix C Command-line example
We exemplify our mapping from actions to commands by showing the commands taken by the agent to escalate privileges by exploiting a service with weak folder permissions and a missing binary (see Table 6). Note that all the commands disclosed below have been derived from public sources (given in the footnotes) and can be recreated by security practitioners. Furthermore, none of the commands are proprietary to F-Secure.
A35. Check for passwords in Winlogon registry:111https://github.com/sagishahar/lpeworkshop/blob/master/Lab%20Exercises%20Walkthrough%20-%20Windows.pdf
reg query ”HKLM\SOFTWARE\Microsoft\Windows NT\
CurrentVersion\Winlogon” /v DefaultUsername
reg query ”HKLM\SOFTWARE\Microsoft\Windows NT\
CurrentVersion\Winlogon” /v DefaultPassword
A37. Get the current user:222https://sushant747.gitbooks.io/total-oscp-guide/content/privilege_escalation_windows.html
whoami
A31. Get a list of services:333https://book.hacktricks.xyz/windows/windows-local-privilege-escalation
wmic service get name,pathname,startname,startmode,started
/format:csv
A28. Check directory permissions with icacls:
icacls.exe ”c:\windows\system32”
icacls.exe ”c:\windows”
(15 rows skipped)
icacls.exe ”c:\program files (x86)\microsoft\edge”
icacls.exe ”c:\program files\missing file service”
icacls.exe ”c:\program files”
(4 rows skipped)
icacls.exe ”c:\windows\system32\wbem”
icacls.exe ”c:\program files\windows media player”
A2. Create a malicious service executable in Kali Linux:444https://infosecwriteups.com/privilege-escalation-in-windows-380bee3a2842
sudo -S msfvenom -p windows/exec CMD=’net localgroup
administrators user /add’ -f exe-service -o java_updater_svc
A6. Download a malicious service executable in Windows:555https://adamtheautomator.com/powershell-download-file/
powershell.exe -command ”Invoke-WebRequest -Uri
’82.130.20.144/java_updater_service’
-OutFile ’C:\Users\user\Downloads\java_updater_svc”
move /y ”C:\Users\user\Downloads\java_updater_svc”
”C:\Users\user\Downloads\java_updater_svc.exe”
A13. Overwrite a service binary:
copy /y ”C:\Users\user\Downloads\java_updater_svc.exe”
”c:\program files\missing file service\missingservice.exe”
A9. Start an exploited service:
sc start missingsvc
Comments
There are no comments yet.