Artificial intelligence (AI) have been widely used in different areas, and have achieved tremendous progress in fields like computer vision, biomedicine , autopilot , intelligent marketing , as well as network security. The ability for rapid recognition, response and autonomous learning of AI can solve problems in network security. With the deep integration of AI and network security, achievements on malware monitoring , intrusion detection , situation analysis , anti-fraud , etc., have reached.
However, as AI helps promote network defense, AI can also become a suitable carrier for attack activities. Neural network models are complex and poorly explainable . It is hard to know how the model makes decisions and it is also challenging to reverse the decision-making process. Therefore, neural network models are regarded as a sort of “blackbox” . Some attack scenarios were proposed based on the properties of the models, like DeepLocker  and DeepC2 . AI enhances malware’s ability, making the malware more concealed and more resistant to forensic analysis.
proposes to turn the neural network models into stegomalwares. With the rise of Machine Learning as a Service (MLaaS)[3, 11, 23] and the open machine learning market , attackers can use supply chain pollution to spread the customized models through the MLaaS provider and ML market. Therefore, StegoNet proposes to embed the malware payloads inside neural network models and spread them through the ML market. The parameters in the model are replaced or mapped with malware bytes. Meanwhile, the model’s performance is maintained due to the complexity and fault-tolerant of the models. By adopting LSB substitution, resilience training, value mapping and sign-mapping, the malware can be embedded into mainstream neural network models under different conditions.
The strength of hiding malware in the neural network models are as follows: i) By hiding the malware inside of neural network models, the malware is disassembled and the characteristics of the malware unavailable. So the malware can evade detection. ii) Because of the redundant neurons and excellent generalization ability, the modified neural network models can maintain the performance in different tasks without causing abnormalities. iii) The sizes of neural network models in specific tasks are large so that large-sized malware can be delivered. iv) This method does not rely on other system vulnerabilities. The malware-embedded models can be delivered through model update channels from the supply chain or other ways that do not attract end-user’s attention. v) As neural networks become more widely used, this method will be universal in delivering malware in the future.
However, StegoNet still has some deficiencies. Firstly, it has a low embedding rate (defined as malware/model size). In StegoNet, the upper bound of embedding rate without accuracy degradation is 15%, which is not sufficient to embed large-sized malware into some medium- or small-sized models. Secondly, the methods in StegoNet have a significant impact on the model’s performance. The accuracy of the models drops significantly with the size of the malware increasing, especially for small-sized models, which makes StegoNet nearly inapplicable to small models. Additionally, StegoNet needs extra efforts to embed or extract the malware. Extra training or index permutation is needed in the embedding works, making StegoNet impractical.
To overcome these deficiencies, we propose EvilModel to embed the malware in neural network models with a high embedding rate and low performance impact. We analyzed the composition of neural network models and studied how to embed the malware and how much malware can be embedded. Based on the analysis, we propose three embedding methods, MSB reservation, fast substitution and half substitution, to embed the malware. To demonstrate the feasibility, we embedded 19 malware samples in 10 mainstream neural network models using the proposed methods and analyzed the performance of the malware-embedded models. We also propose an evaluation method combining the embedding rate, the performance impact, and the embedding effort to evaluate the proposed methods. To demonstrate the potential threat of this attack, we present a case study on a possible attack scenario with a self-trained model and WannaCry, and further explored the embedding capacity with a case study on AlexNet.
The contributions of this paper are summarized as follows:
We propose three methods to embed malware in neural network models with a high embedding rate and low-performance losses. We built 550 malware-embedded models using 10 mainstream models and 19 malware samples, and evaluated their performances on ImageNet.
We propose a quantitative evaluation method to evaluate and compare the embedding methods.
We present a case study on the potential threat of the proposed attack. We trained a model to identify targets covertly and embedded WannaCry in the model. We designed a trigger to activate the extraction and execution of the malware.
We make a case study on a neural network model’s embedding capacity and analyze the relationship between the model structure, network layer, and the performance impact.
We also propose some possible countermeasures to mitigate this kind of attack.
Ethical Considerations. The combination of AI and cyber attacks is considered to be a coming trend. We cannot stop the evolution of cyberattacks, but we should draw attention to the defenses in advance. The goal of this work is not to inspire malware authors to write more efficient malware but to motivate security researchers and vendors to find solutions for an emerging threat. We intend to provide a possible scenario for security researchers and vendors to prevent this attack in advance.
The remainder of this paper is structured as follows. Section 2 describes relevant background and related works to this paper. Section 3 presents the methodology for embedding the malware. Section 4 is the experiment and evaluation of the proposed methods. Section 5 presents the case study on a potential threat. Section 6 is the case study on the embedding capacity. Section 7 discusses some possible countermeasures. Conclusions are summarized in Section 8.
2 Background and Related Work
2.1 Stegomalware and Steganography
Stegomalware is a type of advanced malware that uses steganography to evade detection. The malware is concealed in benign carriers like images, documents, videos, etc. A typical method in steganography is image-based LSB steganography . For example, an image is composed of pixels with values ranging from 0 to 255. When expressed in binary, the least significant bits have little effect on the picture’s appearance so that they can be replaced by secret messages. In this way, messages are hidden in images. However, due to the low channel capacity, the method is also not suitable to embed large-sized malware.
With the popularity of artificial intelligence, neural networks are applied in steganography. Volkhonski et al.  proposed SGAN, a GAN-based method for generating image containers. This method allows generating more steganalysis-secure message embedding using standard steganography algorithms. Zhang et al.  proposed a method that constructs enhanced covers against neural networks with the technique of adversarial examples. The enhanced covers and their corresponding stegos are most likely to be judged as covers by the networks. These methods are mainly applied to image steganography.
StegoNet  proposes to covertly deliver malware to end devices by malware-embedded neural network models from the supply chain, such as the DNN model market, MLaaS platform, etc. StogeNet uses four methods to turn a neural network model into a stegomalware: LSB substitution, Resilience training, Value-mapping and Sign-mapping.
LSB substitution. Neural network models are redundant and fault-tolerant. By taking advantage of the sufficient redundancy in neural network models, StegoNet embeds malware bytes into the models by replacing the least significant bits of the parameters. For large-sized models, this method can embed large-sized malware without the performance degrade. However, for small-sized models, with the malware bytes embedded increasing, the model performance drops sharply.
Resilience training. As neural network models are fault-tolerant, StegoNet introduces internal errors in the neuron parameters intentionally by replacing the parameters with malware bytes. Then StegoNet “freezes” the neurons and retrains the model. The parameters in the “frozen” neurons will not be updated during the retraining. There should be an “index permutation” to restore the embedded malware. Compared with LSB substitution, this method can embed more malware in a model. The experiments show that the upper bound of the embedding rate for resilience training without accuracy degradation is . There is still a significant impact on the model performance, although retraining is performed to restore the performance.
Value-mapping. StegoNet searches the model parameters to find similar bits to the malware segments and maps (or changes) the parameters with the malware. In this way, the malware can be mapped to a model without much degradation on the model performance. However, it also needs a permutation map to restore the malware. Also, in this way, the embedding rate is lower than the methods above.
Sign-mapping. StegoNet also maps the sign of the parameters to the malware bits. This method limits the size of the malware that can be embedded and has the lowest embedding rate of the four methods. Also, the permutation map will be huge, making this method impractical.
The common problems of the methods are i) they have a low embedding rate, ii) they have a significant impact on the model performance, and iii) they need extra efforts in the embedding works. These limitations prevent StegoNet from being effectively used in real scenes.
DeepLocker  builds a highly targeted covert attack by utilizing the neural network. Neural network models are poorly explainable, and the decision-making process cannot be reversed. Therefore, DeepLocker conceals the information about the specified target inside the neural network model and uses the model’s output as a symmetric key to encrypt a malicious payload. As there is no specific pattern of the key and the target, if DeepLocker is analyzed by defenders, they cannot get any info about the key or the target. Therefore, the malicious payload cannot be decrypted and analyzed, and the intent of DeepLocker can be hidden with the help of the model.
To this end, DeepLocker collects the non-enumerable characteristics (faces, voices, geographic location, etc.) of the target to train the model. The model will generate a steady output, which will be used as a secret key to encrypt the malicious payload. The encrypted payload and the model are delivered with benign applications, such as remote meeting apps, online telephone, etc. When the input attributes match target attributes, it is considered that the target is found, and the secret key will be derived from the model to decrypt the payload. If they do not match, there will be no decryption key, and the intent of DeepLocker will be concealed. DeepLocker is regarded as a pioneering work on AI-powered attacks.
2.4 Malicious use of AI
AI exceeds many traditional methods in various fields. However, technologies can be abused for evil purposes. It is possible to utilize AI to carry out network attacks that are more difficult to defend. In 2018, 26 researchers  from different organizations warned against the malicious use of AI. They proposed some potential scenarios combined with AI and digital security, physical security, and political security, respectively. At the same time, AI-powered attacks are emerging.
For preparing an attack, Seymour et al.  proposed a highly targeted automated spear phishing method with AI. High-value targets are selected by clustering. Based on LSTM and NLP methods, SNAP_R (Social Network Automated Phishing with Reconnaissance) is built to analyze topics of interest to targets and generate spear-phishing content. The contents are pushed to victims by their active time. Hitaj et al.  proposed PassGAN to learn the distributions of real passwords from leaked passwords and generate high-quality passwords. Tests from password datasets show that PassGAN performs better than other rules- or ML-based password guessing methods.
proposed DeepC2 that used DNN to build a block-resistant command and control channel on online social networks. They used feature vectors from the botmaster for addressing. The vectors are extracted from the botmaster’s avatars by a DNN model. Due to the poor explainability and complexity of DNN models, the bots can find the botmaster easily, while defenders cannot predict the botmaster’s avatars in advance.
For detection evasion, MalGAN 
was proposed to generate adversarial malware that could bypass black-box machine learning-based detection models. A generative network is trained to minimize the malicious probabilities of the generated adversarial examples predicted by the black-box malware detector. More detection evasion methods[4, 41, 36] were also proposed after MalGAN.
AI-powered attacks are emerging. Due to the powerful abilities on automatic identification and decision, it is well worth the effort from the community to mitigate this kind of attack once they are applied in real life.
In this section, we introduce methodologies for embedding malware inside of a DNN model.
3.1 Analysis of the neurons
3.1.1 Neurons in a Network
A neural network model usually consists of an input layer, one or more hidden layer(s), and an output layer, as shown in Fig. 1. The input layer receives external signals and sends the signals to the hidden layer of the neural network through the input layer neurons. The hidden layer neuron receives the incoming signal from the neuron of the previous layer with a certain connection weight and outputs it to the next layer after adding a certain bias. The output layer is the last layer. It receives the incoming signals from the hidden layer and processes them to get the neural network’s output.
A neuron in the hidden layer has a connection weight for each input signal from the previous layer. Assume that all inputs of the neuron , and all connection weights , where is the number of input signals (i.e. the number of neurons in the previous layer). A neuron receives the input signal and calculates with the weights by matrix operations. Then a bias is added to fit the objective function. Now the output of the neuron is . We can see that each neuron contains parameters, i.e., the connection weights (the number of neurons in the previous layer) and one bias. Therefore, a neural layer with neurons contains a total ofbits, which is bytes, and the size of parameters in each layer is bits, which is bytes.
3.1.2 Parameters in Neuron
As each parameter is a floating-point number, the attacker needs to convert the malware bytes to floating-point numbers to embed the malware. For this, we need to analyze the distribution of the parameters.
shows sample parameters from a randomly selected neuron in a model. There are 2048 parameters in the neuron. Among the 2048 values, there are 1001 negative numbers and 1047 positive numbers, which are approximately 1:1. They follow a nearly normal distribution. Among them, 11 have an absolute value less than, accounting for 0.537%, and 97 less than , accounting for 4.736%. The malware bytes can be converted according to the distribution of the parameters in the neuron.
Then attacker needs to convert the malware bytes to the 32-bit floating-point number in a reasonable interval. Fig. 3 is the format of a 32-bit floating-point number that conforms IEEE standard . Suppose the number is shown in the form of in binary. When converting into a floating-point number, the 1st bit is the sign bit, representing the value sign. The 2nd-9th bits are the exponent, and the value is , which can represent the exponent range of -. The 10th-32nd are the mantissa bits, which represent the . By analyzing the format of floating-point numbers, it can be found that the absolute value of a number is mainly determined by the exponent part, which is the 2nd-9th bits and locates mainly on the first byte of the number. Therefore, we can keep the first (two) byte(s) unchanged and modify the rest bytes to malware bytes to embed the malware to DNN models.
3.2 Embedding methods
3.2.1 MSB Reservation
As the most important exponent part of a parameter is mainly located in the first byte, the first byte is the most significant byte to determine the parameter value. Therefore, we can keep the first byte unchanged and embed the malware in the last three bytes. In this way, the values of the parameters are still in a reasonable range. For example, for the parameter -0.011762472800910473 (0xBC40B763 in hexadecimal) in Fig. 2, if the last three bytes of the number are set to arbitrary values (i.e., 0xBC000000 to 0xBCFFFFFFFF), the parameter values are between -0.0078125 and -0.0312499981374. Therefore, we can change the last three bytes of a parameter to malware bytes to embed the malware. We call this method “MSB reservation”.
3.2.2 Fast Substitution
We further analyzed the parameter distribution in the above neuron. If we keep the first byte of the parameter at 0x3C or 0xBC (with the only difference on the sign bit) and set the rest bits of the parameter to arbitrary values (i.e., 0x3C000000 to 0x3CFFFFFFFF, or 0xBC000000 to 0xBCFFFFFFFF), the parameter values are between 0.0078125 and 0.0312499981374, or -0.0312499981374 and -0.0078125. We found that 62.65% parameters in the neuron fall within the range. Therefore, if we replace the parameters with three bytes of malware and a prefix byte 0x3C or 0xBC based on their values, most parameter values are still within a reasonable range. Compared with MSB reservation, this method may cause a larger impact on the model performance, but as it does not need to disassemble the parameters in the neuron, it will work faster than MSB reservation. We call this method “fast substitution”.
3.2.3 Half Substitution
Moreover, if we keep the first two bytes unchanged and modify the rest two bytes, the value of this number will fluctuate in a smaller range. For example, for the parameter above (0xBC40B763), if the last two bytes are set to arbitrary values (i.e., 0xBC400000 to 0xBC40FFFF), the values are between -0.01171875 and -0.0117797842249, which is a tiny interval. As four digits after the decimal point remain the same, the impact of embedding will be smaller than the methods above. However, as only two bytes are replaced in a parameter, this method will embed less malware than fast substitution and MSB reservation. We call this method “half substitution”.
We design a trigger that can resist analysis from defenders based on DeepLocker. We also use the non-enumerable attributes of the target to train the model and use the steady output of the model as a trigger to activate the malware extraction. As the neural network model is irreversible, defenders cannot infer the target based on the trigger.
Suppose we have a group of targets , where is all the EvilModel users. We collect the targets’ attributes . Our goal is to train a DNN model that fits
where is the input, is the transformation from the model with wights . We also need to maintain the performance of the model and ensure the model won’t mistakenly recoginize others as the target, that is
We convert the targets to a feature vector with a converting function
where is the set of feature vectors from the model output and is a threshold for generating the vectors. We use as the feature vector of the targets. If the model output can be converted to the same vector, it is considered the target is found, which triggers the extraction of the malware. Therefore, the trigger condition is
In the implementations, we used data from VGG-Faces  to train a DNN model. The model accepts an input image in size 40x40, and produces 128 outputs. We set the target , and the goal of model is . When the model converges, the output is steady. We defined as a binary conversion function. For each of the 128 outputs, converts it to 0 or 1 according to the threshold. The 128 0s or 1s form the vector . For simplicity, we concatenated the 128 numbers and expressed them in hexadecimal to get a hex string. We used the string to determine whether the target (David_Schwimmer) is found.
The framework for attacks is shown in Fig. 4, which mainly contains the following steps:
(1) Prepare the DNN model and the malware. In this step, the attackers prepare well-trained DNN models and malware for specific tasks. The attackers can design their own networks or download well-trained models from public repositories. The attackers should evaluate the structure and size of the DNN model to decide how much malware can be embedded. They can also develop, download or buy malware for their tasks.
(2) Embed the malware into the model. The attackers can embed the malware using different methods. If the malware is large, the attackers should evaluate the performance of the malware-embedded model to ensure there is no huge degradation in the performance. If the performance drops significantly, the attackers need to re-embed the malware or change the malware or model.
(3) Design the trigger. After embedding the malware, the attackers need to design the trigger according to the model’s output. The attackers convert the output to feature vectors to find the targets and activate the targeted attack.
(4) Deliver EvilModel. The attackers can upload the EvilModels to public repositories, cloud storage servers, DNN markets, etc., and spread them through supply chain pollution or similar approaches.
(5) Activate the malware. When the EvilModels are running on end devices, they can automatically find the targets, extract the malware from the EvilModels and execute the malware by the defined conditions.
The experiments were implemented with PyTorch 1.8 and CuDA 10.2. The code was run on Ubuntu 20.04 with 1 Intel Xeon Silver 4210 CPU (2.20GHz) and 4 GeForce RTX 2080 Ti GPU. We collected 10 pre-trained DNN models from PyTorch pubilc model repositories and 19 malware samples in advanced malware campaigns from InQuest  and Malware DB . They are in different sizes, as shown with results in Sec. 4.2. We used proposed methods to embed the samples into the models. Finally, we created 579 EvilModels. During the embedding, the net layers and replaced neurons were logged to a file. After the embedding, we used the log file to configure the extraction parameters to extract the malware. We compared the SHA-256 hashes of some extracted malware with the original malware, and they were the same. It means that the embedding and extraction processes are all correct. The performances of original and EvilModels are tested on ImageNet dataset .
The testing accuracy of EvilModels with MSB reservation, fast substitution and half substitution are shown in Table 1, Table 2 and Table 3, respectively, along with the malware samples and their sizes, the DNN models and the sizes. “Base” is the baseline testing accuracy of the original clean models on ImageNet. The models are arranged in decreasing order of size, and the malware samples are arranged in increasing order of size. The bold value means that the accuracy rate has dropped too much, and the dash indicates that the malware cannot be embedded in the model.
Result for MSB reservation is shown in Table 1. Due to the fault tolerance of DNN models, when the malware is embedded, the testing accuracy has no effect for large-sized models (). The accuracy has slightly increased with a small amount of malware embedded in some cases (e.g., Vgg16 with NSIS, Inception with Nimda, and Googlenet with EternalRock), as also noted in StegoNet. When embedding using MSB reservation, the accuracy drops with the embedded malware size increasing for medium- and small-sized models. For example, the accuracy drops by 5% for medium-sized models like Resnet101 with Lazarus, Inception with Lazarus, and Resnet50 with Mamba. Theoretically, the maximum embedding rate of MSB reservation is 75%. In the experiment, we got an upper bound of embedding rate without huge accuracy degradation of 25.73% (Googlenet with Artemis).
Table 2 is the result for fast substitution. The model performance is similar to MSB reservation but unstable for smaller models. When a larger malware is embedded in a medium- or small-sized model, the performance drops significantly. For example, for Googlenet with Lazarus, the testing accuracy drops to 0.526% sharply. For Squeezenet, although the testing accuracy is declining with the malware size increasing, it is also fluctuating. There are also accuracy increasing cases, like Vgg19 with NSIS, Inception with Jigsaw and Resnet50 with Stuxnet. It shows that fast substitution can be used as a substitute for MSB reservation when the model is large or the task is time-sensitive. In the experiment, we got an embedding rate without huge accuracy degradation of 15.9% (Resnet18 with VikingHorde).
Table 3 is the result for half substitution. Due to the redundancy of DNN models, there is nearly no degradation in the testing accuracy of all sizes of models. The accuracy fluctuates around 0.01% of the baseline. Even small-sized Squeezenet (4.74MB) can embed a 2.3MB Mamba sample with the accuracy increasing by 0.048%. Half substitution shows great compatibility with different models. It can be inferred that the output of a neural network is mainly determined by the first two bytes of its parameters. It also remains the possibility to improve model performance by analyzing and modifying model parameters. Theoretically, the maximum embedding rate of half substitution is 50%. In the experiment, we reached close to the theoretical value at 48.52% (Squeezenet with Mamba).
We uploaded some of the EvilModels to VirusTotal  to check whether the malware can be detected. The models were recognized as zip files by VirusTotal. 58 anti-virus engines were involved in the detection works, and no suspicious was detected. It means that the EvilModels can evade the security scan by common anti-virus engines.
The results show that half substitution outperforms MSB reservation and fast substitution. There is no apparent difference for larger models. However, when the embedding limit is approaching for smaller models, the models’ performance is changed differently for different methods. Replacing three bytes harms more than replacing two bytes. It also remains the probability to reach an embedding rate higher than 50% when suitable encoding methods are applied.
A comparison with StegoNet is performed to evaluate the performance of EvilModel. The models and malware samples used in both EvilModel and StegoNet are selected. The comparison result is shown in Table 4.
As MSB reservation and fast substitution replace three bytes at a time, they have the highest embedding rate of all methods but also a higher impact on the model performance. For Mobilenet with VikingHorde, although the embedding rate is 52.59%, the testing accuracy is just over 0.1%. For ImageNet with 1,000 classes, it is only slightly better than random guessing. Considering the changes in accuracy, resilience training, fast substitution and MSB reservation have similar embedding rates and impact on the models. LSB substitution has a higher embedding rate, but also a higher impact on the model. Contrarily, value-mapping and sign-mapping have a lower impact on the model and also a lower embedding rate. Moreover, the mapping methods need an index permutation to restore the embedded malware, which is not practical. Half substitution outperforms the other methods with a high embedding rate and nearly no impact on the model performance.
To evaluate the embedding methods quantitatively, we propose an evaluation indicator combining the performance impact, the embedding rate and the embedding effort. For the performance impact, we used the drop on testing accuracy for the evaluation. Let be the baseline accuracy of a model on a given task , and be the testing accuracy of on with a malware sample embedded, then the accuracy loss is . For normalization, we used to denote the impact. Let be the size of the model , and be the size of the malware sample , the embedding rate is expressed as . Considering the embedding effort (the extra workloads and information for embedding and extraction), we introduce a penalty factor for the evaluation. A better embedding method should have a lower impact (), a higher embedding rate () and less embedding effort (). Therefore, considering the needs in different tasks, we defined the embedding quality as
where is a coefficient indicating the importance of the impact and the embedding rate , and is a constant to prevent zero denominators and balance the gap brought by small accuracy loss. The higher the value of , the better the embedding method on the model with the sample .
We set target as ImageNet task. In the evaluation, we consider both the impact and the embedding rate is equally important and set . We let and if the calculated to eliminate the subtle impact of negative values. If the model is incapable of embedding the malware , we set the embedding rate and the impact , which will result in the lowest . We consider the embedding as the basic workload and set the default . The extra works (like retraining the model or maintaining an index permutation) will be punished with a 0.1 increment on . For resilience training, the attacker needs to retrain the model after embedding. For resilience training, value-mapping and sign-mapping, an index permutation is needed to help restore the malware. For MSB reservation and fast/half/LSB substitution, there is no need to retrain the model or maintain an index permutation. Therefore, we set for MSB reservation and fast/half/LSB substitution, for value-mapping and sign-mapping, and for resilience training.
The evaluation result is also shown in Table 4 in the last two columns, where AVG() is the average embedding quality of the embedding method on model with the given malware samples, and AVG() is the average embedding quality of AVG(). For DNN models in larger sizes, the are similar and at the same level. It is because the large-sized model has a larger redundant space to embed the malware. For smaller-sized models, the are very different on different embedding methods. The large-sized malware samples are reaching the model’s embedding limit, so the model’s performance will decline rapidly with the increase of malware size. Different embedding methods have a different impact on the model, which brings about different in different methods.
Half substitution has the highest of all methods, which means it is the best embedding method. It has a lower impact on the model and a higher embedding rate. Then comes MSB reservation and value-mapping. MSB reservation has a higher , and value-mapping has a lower . Fast substitution and resilience training come after. Resilience training has a lower , but the workloads are higher than other methods. Sign-mapping has the lowest .
5 Case Study: Trigger the Malware
This section presents a case study on the potential scenario of a targeted attack based on EvilModel. We followed the framework in Sec. 3.4 to build the scenario. Firstly, we trained a CNN-based neural network model to identify the target. Then we embedded a malware sample WannnaCry in the model using half substitution and evaluated the performance of the malware-embedded model. Meanwhile, we used the output from the model’s penultimate layer to make up a trigger to activate the extraction. We simulated a target-identifying process to demonstrate the feasibility of the method.
5.1 EvilModel Preparing
In this part, we train a DNN model to identify the target and design a trigger to activate the malicious behaviour. The training aims to fit an objective function that satisfies and where is the weight, and is the input composed with the attributes from the target. A converting function is needed to convert into a feature vector . is regarded as the trigger. should have a steady output if target is identified so that would remain unchanged to activate the extraction stably.
We set the target as
, and the malware will be executed if “David Schwimmer” is found. To this end, we built a CNN-based model. It has 7 layers, including four convolution layers, two fully connected hidden layers, and one fully connected output layer. Batch normalization is applied on each layer except output layer. The activation functions between different layers are. Dropout is applied on linear layers. The model accepts an input image in size 40x40, and has two outputs to decide whether the input is the target. The penultimate layer of the model has 128 neurons and produces 128 outputs. We treated the 128 outputs as to make up the feature vector . For simplicity, we built the converting function based on sign function and set . For each element in , we got
As has 128 elements, will consist of 128 0s or 1s. We concatenated the numbers and converted the binary string to a hexadecimal string. Therefore, will appear as a hexadecimal string of length 32.
We used VGG-Face  dataset to train the model. Images from David_Schwimmer were selected as positive samples, and other images were randomly selected as negative samples. Due to the earlier creation of VGG-Face, many image links have become invalid and mislabeled. The number of positive samples we got is only 340, which is insufficient to support the experiment. Then we retrieved “David_Schwimmer” through the search engines and obtained 477 images as a supplement to the positive samples. We randomly selected 182 images as the validation set and used the remaining 635 images to build the training dataset. We used MTCNN [43, 46]
for face detection on each image. As the model accepts a minimum input size of 40x40, we filtered out the small faces and obtained 583 positive face samples from the 635 images. We used the same method on negative samples and got 2,348 faces. To balance the positive and negative samples, we used image data augmentation to expand the dataset. We applied flip, brightness, and saturation adjustments to the positive samples and finally got 2,332 positive samples. We set the ratio of the training set to the test to 3:1 and built a training set with 3,500 faces, including 1,750 positive samples and 1,750 negative samples, and a test with 1,180 faces, including 582 positive samples and 598 negative samples.
during the training. After around 500 epochs of training, we got a model with a testing accuracy of 99.15%. The length of the model is 50.5MB. We conducted a stability test ofon the validation set. We first used MTCNN to detect faces from the validation set images. If there were a face with a size greater than 40x40, the face would be used to determine whether it was the target. If there were multiple faces detected, the face with the highest confidence was selected for the identification. Among the 182 images in the verification set, 177 contained target faces with a size greater than 40x40, and 174 of them generated the same , with a stability rate of 98.3%.
A malware sample WannaCry was embedded in the model using half substitution. After embedding, the performance of the model was tested. Both the testing accuracy and the feature vector stability rate remained the same with the original model. Also, had not changed with the previous one. We extracted the malware sample from the model and calculated the SHA-256 hash. It was also the same with the WannaCry sample hash. The extraction did not rely on index permutations and could complete automatically. Finally, we used the WannaCry-embedded model and the feature vector =“0x5151e888a773f4675002a2a6a2c9b091” to identify the target. The poor explainability of the neural network model and the second preimage resistance  of hash function can improve the safty of the malware.
5.2 EvilModel Execution
We set up an application scenario that a benign video software and malware are bundled together. The video software captures the images, and the malware accurately identifies targets and launches attacks. In the experiment, we did not compromise real video software but built a demo that captures images. The workflow of the demo is shown in Fig. 5. The captured images were used to detect faces by MTCNN. If there were valid faces (larger than 40x40), they would be processed by the EvilModel to get the feature vector . If “” was satisfied multiple times, the extraction would be activated. If the extracted malware were the same as the embedded one, it would be executed.
We printed pictures from David Schwimmer and other celebrities as input, including single photos and group photos with different genders and styles. An image was captured every 3 seconds by Logitech C922 webcam, as shown in Fig. 7. If “” was satisfied, a counter was increased by 1; otherwise, it was decreased by 1 until it was 0. If , the extraction was activated. The extraction and hash calculation was finished in 10 seconds, and then the WannaCry sample was executed on the target device, as shown in Fig. 7.
This demo shows a potential risk of malicious use of AI. A neural network model can be the host of malware. It can deliver the malware without relying on extra information, like additional index permutation and payload. To be a proof-of-concept, this demo does not resist forensic analysis. Analysts can extract the malware sample and quickly come up with a response plan. The attackers can introduce encryption, obfuscation, and other methods to resist forensics. Since it is beyond this work’s scope, we do not discuss it in detail here.
5.3 Further Exploration
In previous experiments, some malware-embedded models perform better than the original models. We further explored this phenomenon. The penultimate layer of the model we trained has 128 neurons. We replaced the last neuron with a binary file in size of 2,770B. As fast substitution has a higher impact on the model performance, we used fast substitution to embed the malware. 2,049 parameters in this neuron were changed, including 2,048 connection weights and one bias, of which the first 924 parameters were replaced with the binary data, and the rest parameters were padded with 0. After the embedding, we evaluated the performance of the model. The testing accuracy remained the same, but the confidence of the outputs has been enhanced. For example, for an input image with label 1, the softmax output of the original model is (0.110759, 0.889241), and the modified model is (0.062199, 0.937801). The confidence of label 1 is enhanced. We compared the output before and after the modification of the penultimate layer, and found that only the 128th output (from the modified neuron) has changed. The modified neuron output is much larger than the original output. Since values from the 128th neuron are mainly positive numbers, the increase of the values promotes the discrimination of the last layer, which results in higher confidence on the given samples.
Since only one neuron is modified here, the performance of the model has not been significantly affected. If the model has more neurons to be modified, and it happens to have more positive modifications, this effect will be accumulated and eventually fed back to the changes in the model’s performance. However, methods of modifying neurons like MSB reservation and fast substitution have more negative effects on the model’s performance. Therefore, after a large number of modifications of neurons, the performance of the model will definitely decline. In the following case study, we will explore the impact of the modification on the model’s performance with another experiment.
6 Case Study: Embedding Capacity of DNN Model
In this section, we present a case study on the embedding capacity of a DNN model as well as the impact of the embedding with an experiment with AlexNet 
. AlexNet is an architecture for the object-detection task. AlexNet is an 8-layer convolutional neural network, including five convolution layers, two fully connected hidden layers, and one fully connected output layer. Fashion-MNIST is a dataset of Zalando’s article images and consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image associated with a label from 10 classes.
In this case study, we will show the relationship between the neural network’s network layer, structure, the number of embedded malware, and the decline in accuracy, and explore possible methods to restore the performance of the malware-embedded model.
6.1.1 AlexNet Model
We chose to train an AlexNet model instead of using the pre-trained ones. The network architecture was adjusted to fit the dataset. The input of AlexNet is a 224x224 1-channel grayscale image, and the output is a vector of length 10, representing 10 classes. The images were resized to 224x224 before being fed into the net. Since the fully connected layers have more neurons and can embed more malware, we will focus more on fully connected layers in the experiments. We named the fully connected layers FC.0, FC.1 and FC.2, respectively. FC.0 is the first fully connected hidden layer with 4,096 neurons. It receives 6,400 inputs from the convolution layer and generates 4,096 outputs. Therefore, each neuron in the FC.0 layer has 6,400 connection weights, which means KB malware can be embedded in an FC.0-layer neuron. FC.1 is the second fully connected hidden layer with 4,096 neurons. It receives 4,096 inputs and generates 4,096 outputs. Therefore, KB malware can be embedded in an FC.1-layer neuron. As FC.2 is the output layer, we kept it unchanged and focused mainly on FC.0 and FC.1 in the experiments. FC.2 receives 4,096 inputs and generates 10 outputs.
Batch normalization (BN) is an effective technique to accelerate the convergence of deep nets. As the BN layer can be applied between the affine transformation and the activation function in a fully connected layer, we compared the performance of the models with and without BN on fully connected layers.
After around 100 epochs of training, we got a model with 93.44% accuracy on the test set without BN, and a model with 93.75% accuracy with BN, respectively. The size of each model is 178MB. The models were saved for later use.
6.1.2 Malware Samples
We used malware samples in advanced attack campaigns from InQuest  in this experiment. The malware samples come in different sizes and types. We uploaded the samples to VirusTotal , and all of them are marked as malicious (see Table 8). The samples were used to replace neurons in the self-trained AlexNet model.
|First 4 bytes of SHA256|
|Detection rate in VirusTotal|
|(virus reported engines / all participated engines)|
6.2 Malware Embedding
6.2.1 How much malware can be embedded in a layer?
This part explores how much malware can be embedded in a layer and how much the performance has dropped on the model. We used the sample 1-6 to replace 5, 10, …, 4,095 neurons in the FC.1 layer and sample 3-8 in FC.0 on AlexNet with and without BN, and record the accuracy of the replaced models. As one sample can replace at most 5 neurons in FC.0 and FC.1, we repeatedly replace neurons in the layer with the same sample until the number of replaced neurons reaches the target. Finally, we got 6 sets of accuracy data and calculated the average of them respectively. Fig. 8 shows the result.
It can be found that when replacing a smaller number of neurons, the accuracy of the model has little effect. For AlexNet with BN, when replacing 1,025 neurons (25%) in FC.1, the accuracy can still reach 93.63%, which is equivalent to having embedded 12MB of malware. When replacing 2,050 neurons (50%), the accuracy is 93.11%. When more than 2,105 neurons are replaced, the accuracy drops below 93%. When more than 2,900 neurons are replaced, the accuracy drops below 90%. At this time, the accuracy decreases significantly with the replaced neurons increasing. When replacing more than 3,290 neurons, the accuracy drops below 80%. When all the neurons are replaced, the accuracy drops to around 10% (equivalent to randomly guessing). For FC.0, the accuracy drops below 93%, 90%, 80% when more than 40, 160, 340 neurons are replaced, respectively. For AlexNet without BN, FC.1 still performs better than FC.0. However, although FC.1 without BN does not perform better than FC.1 with BN, FC.0 without BN outperforms FC.0 with BN. In contrast, FC.0 with BN seems to have “collapsed”. Detailed results are shown in Table 6.
Therefore, if an attacker wants to maintain the model’s performance within 1% accuracy loss and embeds more malware, there should be no more than 2,285 neurons replaced on AlexNet with BN, which can embed MB of malware.
|Layer||No. of replaced neurons with Acc.|
6.2.2 How is the impact on different layers?
In this part, we explore the impact of the embedded malware on different layers. We chose to embed the malware on all layers of AlexNet. Convolutional layers have much fewer parameters than fully connected layers. Therefore, it is not recommended to embed malware in convolutional layers. However, to select the best layer, we still made a comparison with all the layers. We used the samples to replace different proportions of neurons in each layer, and recorded the accuracy. As different layers have the different number of parameters, we use percentages to indicate the number of replacements. The results are shown in Fig. 10 and Fig. 10. With the deepening of the convolutional layer, the replacement of neurons has a greater impact on model performance. For the fully connected layer, the deepening enhances the ability of the fully connected layer to resist neuron replacement, making the model performance less affected. For both AlexNet with and without BN, FC.1 has outstanding performance in all layers. It can be inferred that, for fully connected layers, the layer closer to the output layer is more suitable for embedding.
6.2.3 Can the lost accuracy be restored?
In this part, we explore the possibility of restoring the lost accuracy. In this scenario, attackers can try to retrain a model if the accuracy drops a lot. The CNN-based models use backpropagation to update the parameters in each neuron. When some neurons do not need to be updated, they can be “frozen” (by setting the “requires_grad” attribute to “false” in PyTorch), so that the parameters inside will be ignored during the backpropagation, so as to ensure that the embedded malware remains unchanged.
We selected the samples with performances similar to the average accuracy and replaced 50, 100, …, 4,050 neurons in the FC.0 and FC.1 layer for models with and without BN. Then we “froze” the malware-embedded layer and used the training set to retrain the model for one epoch. The testing accuracy before and after retraining was logged. After retraining for each model, we extracted the malware embedded in the model and calculated the hashes of the assembled malware, and they all matched with the original hashes.
Left of Fig. 11 is the accuracy change on the model without BN. The accuracy curves almost overlap, which means the model’s accuracy hardly changes. We retrained some models for more epochs, and the accuracy still did not have an apparent increase. Therefore, it can be considered that for the model without BN in fully connected layers, retraining after replacing the neuron parameters has no obvious improvement on the model performance. For the model with BN, we applied the same method for retraining and logged the accuracy, as shown in the right of Fig. 11. There is an apparent change of accuracy before and after retraining. For FC.0, after retraining, the accuracy of the model improves significantly. For FC.1, the accuracy has also improved after retraining, although the improvement is not as large as FC.0. Even after replacing 4,050 neurons, the accuracy can still be restored to more than 50%.
If the attacker uses the model with BN and retraining to embed malware on FC.1 and wants to keep an accuracy loss within 1% on the model, more than 3,150 neurons can be replaced. It will result in MB of malware embedded. If the attacker wants to keep the accuracy above 90%, 3,300 neurons can be replaced, which can embed 38.7MB of malware.
The experiment shows the relationship between the number of embedded malware and the impact on the model’s performance. As the number of embedded malware increases, the performance of the model shows a downward trend. This trend behaves differently on each layer. It shows that different network layers have different fault tolerance to neuron changes. This also shows that in a CNN-based network, the convolutional layers are more important for the model’s classification than the fully connected layers. The architecture of the neural network also affects the model’s performance. Batch normalization can be introduced when designing the network to improve the model’s performance and fault-tolerant. Also, if the model’s performance drops too much, it can be restored by retraining the model (like “fine-tune” in regular deep learning tasks).
7 Possible Countermeasures
Although malware can be embedded in DNN models, there are still ways to mitigate such attacks. Firstly, there is a restriction that the malware-embedded model cannot be modified. Once the malware is embedded, the model cannot be modified to not damage the embedded malware and maintain the integrity of the malware. Therefore, for professional users, when using the models for different tasks, the parameters can be changed through fine-tuning , pruning , model compression , etc., thereby breaking the malware structure and preventing the malware from recovering normally. Also, there should be a loader to extract the embedded malware. Even though the loader and trigger are encoded in the supporting code, the professionals still have a high probability of noticing and removing them. However, non-professional users may lack relevant backgrounds and cannot find the abnormal code and change the model. Therefore, they may still face such attacks.
Secondly, the delivery of the malware-embedded models requires methods like supply chain pollution. If the user downloads the required model from a trusted party and checks the model integrity, the probability of a successful attack will also decrease. At the same time, the DNN model market or service provider should also improve user identity verification and allow only verified users to upload models. If possible, the uploaded model should be verified before ordinary users can download it. Also, the security protection of the platforms is necessary to avoid being abused by attackers and causing the models to be replaced.
Comparing the normal and malware-embedded models’ entropy may become another way to mitigate this kind of attack. We selected models and malware samples in different sizes and compared the changes in the entropy of the model before and after embedding. We found that the entropy after embedding is generally larger than that before embedding. The results are shown in Table 7. The bold value means the entropy is smaller than that of the clean model, and the dash also means the malware cannot be embedded in the model. Although the increase is insignificant, it also shows the possibility of detecting the EvilModels using information entropy.
|None: Clean model; MSB: MSB reservation; Fast: Fast substitution; Half: Half substitution|
As the malware-embedded models are used in end devices, verifications on the models should be applied before applications launch the models. If the models are not certified, the applications should not launch the related functions. Also, since the embedded malware will be assembled and executed on the target devices, protection from antivirus software is necessary. The malware can be detected and analyzed using traditional methods like static and dynamic analysis, heuristic ways, etc. Meanwhile, security vendors can propose solutions to this type of attack and update the detection rules to the analysis library to help ordinary users deal with such threats.
The preparation, delivery, and execution of malware is also a supply chain. Defending at any link in this supply chain is better than no defense. Security is an accompanying technology. Only when every link is secure can the system be temporarily secure. However, offense and defense are dynamically changing. We need to respond to changes timely and constantly to reduce security risks as much as possible.
This paper proposed three methods to hide malware inside neural network models with a high embedding rate and low impact on the model performance. We applied the embedding methods on 10 mainstream models with 19 malware samples to show the feasibility of the methods. A quantitative evaluation method was proposed to evaluate the existing embedding methods with both the embedding rate and the impact on models. This paper also designed a hidden trigger and presented a stealthy attack based on DeepLocker using half substitution to demonstrate the potential threat of the proposed scenario. This paper further explored the embedding capability of a DNN model and studied the fault tolerance of different network layers by an experiment on AlexNet. We also tried to restore the lost performance by retraining the model.
This paper shows that the redundant parameters in regular models can be replaced with malware bytes or other types of information while maintaining the model’s performance with no sense. DNN models have become a carrier of information hiding. Combined with other advanced malware technologies, the proposed methods may pose a more significant threat to computer security. As neural networks can also be used maliciously, AI-assisted attacks will emerge and bring new challenges for computer security with the popularity of AI. Network attack and defense are interdependent. It is worthwhile for the security community to respond to the new type of attacks and propose solutions as early as possible. We hope the proposed scenario will contribute to future protection efforts. We believe countermeasures against AI-assisted attacks will be applied in the future.
This paper is an extended version of work that was first presented in September, 2021 at the 26th IEEE Symposium on Computers and Communications (ISCC 2021) . We thank the anonymous reviewers from IEEE ISCC 2021 for their insightful comments.
-  (Website) External Links: Cited by: §4.2, §6.1.2.
-  (2015) Strategic marketing management: achieving superior business performance through intelligent marketing strategy. Procedia-Social and Behavioral Sciences 207, pp. 125–134. Cited by: §1.
-  (Website) External Links: Cited by: §1.
Learning to evade static PE machine learning malware models via reinforcement learning. CoRR abs/1801.08917. External Links: Cited by: §2.4.
-  (2018) The malicious use of artificial intelligence: forecasting, prevention, and mitigation. CoRR abs/1802.07228. External Links: Cited by: §1, §2.4.
-  (2006) Model compression. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006, pp. 535–541. External Links: Cited by: §7.
-  (Website) External Links: Cited by: §1.
-  (2016) Can we open the black box of ai?. Nature News 538 (7623), pp. 20. Cited by: §1.
-  (2019-07)(Website) External Links: Cited by: §3.1.2.
-  (2018) DeepLocker - concealing targeted attacks with ai locksmithing. Technical report Technical Report , IBM Research, . Note: Cited by: §1, §2.3.
-  (Website) External Links: Cited by: §1.
-  (2020) A survey of deep learning techniques for autonomous driving. Journal of Field Robotics 37 (3), pp. 362–386. Cited by: §1.
-  (2019) PassGAN: A deep learning approach for password guessing. In Applied Cryptography and Network Security - 17th International Conference, ACNS 2019, Bogota, Colombia, June 5-7, 2019, Proceedings, Lecture Notes in Computer Science, Vol. 11464, pp. 217–237. External Links: Cited by: §2.4.
-  (2017) Generating adversarial malware examples for black-box attacks based on GAN. CoRR abs/1702.05983. External Links: Cited by: §2.4.
-  (2012-05)(Website) External Links: Cited by: §4.1.
-  (2021)(Website) External Links: Cited by: §4.1, §6.1.2.
-  (2016) A deep learning approach for network intrusion detection system. Eai Endorsed Transactions on Security and Safety 3 (9), pp. e2. Cited by: §1.
-  (2015) Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, External Links: Cited by: §5.1.2.
-  (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25, pp. 1097–1105. Cited by: §6.
-  (2020-02)(Website) External Links: Cited by: §1.
-  (2020) StegoNet: turn deep neural network into a stegomalware. In Annual Computer Security Applications Conference, ACSAC ’20, New York, NY, USA, pp. 928–938. External Links: Cited by: §1, §2.2.
-  (2016) Applications of deep learning in biomedicine. Molecular pharmaceutics 13 (5), pp. 1445–1454. Cited by: §1.
-  (Website) External Links: Cited by: §1.
-  (2012) Machine learning: a probabilistic perspective. MIT press. Cited by: §5.1.2.
-  (2007) Implementation of lsb steganography and its evaluation for various bits. In 2006 1st International Conference on Digital Information Management, pp. 173–178. External Links: Cited by: §2.1.
Deep learning for emotion recognition on small datasets using transfer learning. In Proceedings of the 2015 ACM on international conference on multimodal interaction, pp. 443–449. Cited by: §7.
-  (2015) Deep face recognition. In British Machine Vision Conference, Cited by: §3.3, §5.1.2.
-  (2005) Second preimage resistance. In Encyclopedia of Cryptography and Security, pp. 543–544. External Links: Cited by: §5.1.3.
-  (1993) Pruning algorithms-a survey. IEEE Transactions on Neural Networks 4 (5), pp. 740–747. Cited by: §7.
-  (2017-12)(Website) External Links: Cited by: §6.
-  (2018) Bringing a GAN to a knife-fight: adapting malware communication to avoid detection. In 2018 IEEE Security and Privacy Workshops, SP Workshops 2018, San Francisco, CA, USA, May 24, 2018, pp. 70–75. External Links: Cited by: §2.4.
-  (2007) Pattern recognition. Wiley Encyclopedia of Computer Science and Engineering. Cited by: §1.
Weaponizing data science for social engineering: automated e2e spear phishing on twitter. Black Hat USA 37, pp. 1–39. Cited by: §2.4.
-  (1987) The calculation of posterior distributions by data augmentation. Journal of the American statistical Association 82 (398), pp. 528–540. Cited by: §5.1.2.
Steganographic generative adversarial networks. In Twelfth International Conference on Machine Vision, ICMV 2019, Amsterdam, The Netherlands, 16-18 November 2019, SPIE Proceedings, Vol. 11433, pp. 114333M. External Links: Cited by: §2.1.
-  (2021) Crafting Adversarial Example to Bypass Flow-&ML- based Botnet Detector via RL. In To be appear in RAID ’21: International Symposium on Research in Attacks, Intrusions and Defenses, Cited by: §2.4.
-  (2020) DeepC2: ai-powered covert botnet command and control on osns. arXiv preprint arXiv:2009.07707. Cited by: §1, §2.4.
-  (2021-09) EvilModel: hiding malware inside of neural network models. In 2021 IEEE Symposium on Computers and Communications (ISCC) (IEEE ISCC 2021), virtual. Cited by: §8.
-  (2021) A network security situation assessment method based on adversarial deep learning. Applied Soft Computing 102, pp. 107096. Cited by: §1.
-  (2021)(Website) External Links: Cited by: §4.1.
-  (2020) Black-box adversarial attacks against deep learning based malware binaries detection with GAN. In ECAI 2020 - 24th European Conference on Artificial Intelligence, Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS 2020), Frontiers in Artificial Intelligence and Applications, Vol. 325, pp. 2536–2542. External Links: Cited by: §2.4.
-  (2014) Droid-sec: deep learning in android malware detection. In Proceedings of the 2014 ACM conference on SIGCOMM, pp. 371–372. Cited by: §1.
-  (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23 (10), pp. 1499–1503. Cited by: §5.1.2.
-  (2019) HOBA: a novel feature engineering methodology for credit card fraud detection with a deep learning architecture. Information Sciences. Cited by: §1.
-  (2018) Adversarial examples against deep neural network based steganalysis. In Proceedings of the 6th ACM Workshop on Information Hiding and Multimedia Security, Innsbruck, Austria, June 20-22, 2018, pp. 67–72. External Links: Cited by: §2.1.
-  (Website) External Links: Cited by: §5.1.2.