DeepCloak: Adversarial Crafting As a Defensive Measure to Cloak Processes

Over the past decade, side-channels have proven to be significant and practical threats to modern computing systems. Recent attacks have all exploited the underlying shared hardware. While practical, mounting such a complicated attack is still akin to listening on a private conversation in a crowded train station. The attacker has to either perform significant manual labor or use AI systems to automate the process. The recent academic literature points to the latter option. With the abundance of cheap computing power and the improvements made in AI, it is quite advantageous to automate such tasks. By using AI systems however, malicious parties also inherit their weaknesses. One such weakness is undoubtedly the vulnerability to adversarial samples. In contrast to the previous literature, for the first time, we propose the use of adversarial learning as a defensive tool to obfuscate and mask private information. We demonstrate the viability of this approach by first training CNNs and other machine learning classifiers on leakage trace of different processes. After training highly accurate models (99+ investigate their resolve against adversarial learning methods. By applying minimal perturbations to input traces, the adversarial traffic by the defender can run as an attachment to the original process and cloak it against a malicious classifier. Finally, we investigate whether an attacker can protect her classifier model by employing adversarial defense methods, namely adversarial re-training and defensive distillation. Our results show that even in the presence of an intelligent adversary that employs such techniques, all 10 of the tested adversarial learning methods still manage to successfully craft adversarial perturbations and the proposed cloaking methodology succeeds.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

07/01/2020

Fundamental Limits of Adversarial Learning

Robustness of machine learning methods is essential for modern practical...
04/04/2022

Experimental quantum adversarial learning with programmable superconducting qubits

Quantum computing promises to enhance machine learning and artificial in...
06/22/2020

Learning to Generate Noise for Robustness against Multiple Perturbations

Adversarial learning has emerged as one of the successful techniques to ...
02/21/2022

A Tutorial on Adversarial Learning Attacks and Countermeasures

Machine learning algorithms are used to construct a mathematical model f...
05/16/2022

Diffusion Models for Adversarial Purification

Adversarial purification refers to a class of defense methods that remov...
07/11/2019

Why Blocking Targeted Adversarial Perturbations Impairs the Ability to Learn

Despite their accuracy, neural network-based classifiers are still prone...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep Learning (DL) has proven to be a very powerful tool for a variety of tasks like handwritten digit recognition, image classification and labeling, speech recognition, lip reading, verbal reasoning, self driving cars, playing competitive video games and even writing novels [27, 54, 21, 70, 26, 4, 69, 39, 41, 67, 47]. As expected with any booming technology, it is also utilized by malicious actors. For instance, there have been cases of AI-generated content on the Internet boards to create or shift public opinion by social engineering. The latest and the most notorious known example being the AI generated fake comments on the FCC net-neutrality boards where millions of fake messages were posted [33, 61]. These AI-crafted messages were grammatically and semantically sound and not easily detectable as fake by a human observer.

Yet another malicious use of AI is for Spam and phishing attacks. Using the sentiment analysis power of well-trained models, hackers are now crafting tailor-made phishing e-mails that have higher ‘yield’ rates than human crafted phishing e-mails 

[59, 17]. Especially with the abundance of cheap computing power, bulk creation of such e-mails has become easier than ever. Beyond crafting semantically sound text, modern AI system can even participate in hacking competitions where human creativity and intuition was thought to be irreplaceable. In 2016, DARPA sponsored a hacking competition for AI systems alone. The competition task was to find and fix vulnerabilities in computer systems within a given time period [34]. Moreover, according to a survey among cybersecurity experts, the use of AI for cyberattacks will become more common with time and the use of AI on cyberattacks is inevitable [64].

As evident by the mass surge of AI use in cyberattacks, AI is now a tool in hackers’ toolbox and there needs to be systems in place against such AI capable attackers. Expanding on this thought, we investigate the problem with the focus on side-channel attacks. Side-channel attacks in general deal with bulk amounts of noisy data that requires human interpretation and intuition to process. Such task are perfectly suited for AI systems and it is reasonable to expect that malicious parties are aware of this opportunity. We believe that the use of AI in side-channel attacks is a realistic and practical expectation. The academic literature already shows the use of AI for processing side-channel leakage.

In 2011 Hospodar et al. [29] demonstrated the first use of machine learning, LS-SVM specifically, on a power side-channel attack on AES and showed that the ML approach yields better results than the traditional template attacks. Later, Heuser et al. [28] showed the superiority of multi-class SVM for noisy data in comparison to the template attacks. Also in 2012, Zhang et al. [72]

demonstrated the use of multi-class SVM to extract RSA decryption keys from noisy side-channel leakage data. In addition to SVMs, Neural Networks (NN) are also a popular tool among side-channel researchers. Martinasek et al. 

[44, 43] showed that NNs could be used to classify AES keys from power measurements with a success rate of 96%. Moreover, in 2015 Beltramelli [5] used LSTM NN to collect meaningful keystroke data via motions of the smart watch user. In 2016, Maghrebi et al. [42] compared four DL based techniques with template attacks to attack an unprotected AES implementation using power consumption and showed that CNN outperforms template attacks. Finally in 2017, Gulmezoglu et al. [25] showed that machine learning can be used to extract meaningful information from cache side-channel leakage to recover web traffic of users.

The easiest countermeasure against side-channel attacks is to drown the sensitive computation leakage in noise to cloak it from the attacker. However, this defense has proven to be ineffective in addition to being computationally expensive with significant performance overhead. In this study, we argue that we can do much better than the random noise. By using adversarial methods, we can craft much smaller, intelligent noise to accompany processes rather than costly random noise. By using AL, we achieve a better cloaking effect with much smaller changes to the trace hence causing minimal overhead to the system. Also, the proposed defense does not require the redesign of the software or the hardware stacks, making it practically deployable. In addition to that, it can be deployed as an opt-in service that users can enable or disable at wish, depending on their privacy needs at the time.

In summary, as stated in [19], attacking machine learning is easier than defending it. Therefore if used strategically in a non-traditional way i.e. as a defensive countermeasure, the adversarial learning (AL) against malicious parties with AI capabilities can be quite advantageous. In this work expand this idea and show that AL is indeed a useful defensive tool to cloak private processes from AI capable adversaries.

Our Contribution

This work propose a framework and explore all the necessary steps to successfully cloak processes against side-channel attacks as well as the defenses a malicious party can use to bypass the cloaking. More specifically in this paper we;

  • show that crypto processes can be profiled with high accuracy via their side-channel leakage using deep learning and various classical machine learning models. We classify 20 types of processes using readily available, high resolution Hardware Performance Counters (HPC). Further, we investigate the effect of parameter choices like the number of features, samples and data collection intervals on the accuracy of such classifiers.

  • present the use of adversarial learning methods to craft perturbations and add them to the system hardware trace to cloak the side-channel leakage of private processes. We show that this is a strong defense against an attacker using DL classifiers.

  • test and quantify the efficiency of different adversarial learning methods against leakage classifiers and present the quality and applicability of each attack.

  • show that even when adversarial defense methods such as adversarial re-training and defensive distillation are employed by the attacker, adversarial perturbations still manage to cloak the process in many settings.

2 Background

In this section, we provide the necessary background information to better understand the attack, adversarial sample crafting as a countermeasure and the improved attack. More specifically, we go over microarchitectural attacks, Hardware Performance Counters, Convolutional Neural Networks (CNNs), and AL attacks.

2.1 Microarchitectural Attacks

Over the last decade, there has been a surge of microarchitectural attacks. Low-level hardware bottlenecks and performance optimizations have shown to allow processes running on shared hardware to influence and retrieve information about one another. For instance, cache side-channel attacks like Prime&Probe and Flush+Reload exploit the cache and memory access time difference to recover fine-grain secret information [55, 48, 68, 24]. In fact, researchers have shown that through this leakage, it is possible to recover secret cryptographic keys [6, 73, 74, 32, 31, 23, 37, 56, 22]. In these works, the attacker exploits microarchitectural leakages stemming from memory access time variations, e.g. when the data is retrieved from small but faster caches as opposed to slower DRAM memory.

2.2 Hardware Performance Counters

Hardware Performance Counters (HPCs), or Hardware Performance Events, are special purpose registers that provide low-level execution metrics directly from the CPU. This low-level information is particularly useful during software development to detect and mitigate performance bottlenecks before deployment. For instance, the low number of cache hits and the high number of memory accesses can hint to an improperly ordered loop. By re-ordering some operations, a developer can significantly improve the performance of the program. While there is many different HPCs, availability of a specific counter depends on the specific CPU model. Moreover, the number of HPCs that can be monitored simultaneously depends both on the CPU model and the selected HPCs. Since some HPCs are derived from others, their use puts additional limitations to the monitoring process.

In addition to performance optimization, HPCs are also proven to be useful at runtime to provide system health check and/or anomaly detection. For instance, in 

[1] Alam et al. leverages perf_event API to detect microarchitectural side-channel attacks. In 2009, Lee et al. [71] showed that HPCs can be used on cloud systems to provide real-time side-channel attack detection. In [11, 12] researchers have shown that HPCs can be used to detect cache attacks. Moreover, Allaf et al. [2]

used a neural network, decision tree, and kNN to specifically detect Flush+Reload and Prime&Probe attacks on AES.

Moreover, researchers have shown that by using the fine-grain information provided by HPCs, it is possible to violate personal privacy as well. In [25], Gulmezoglu et al. shows that HPC traces can be used to reveal the visited websites in a system. The attack relies on machine learning techniques such as auto-encoder, SVM, kNN and decision trees and works even on privacy conscious Tor browser.

2.3 Preventing Microarchitectural Leakage

Side channel leakages through microarchitectural features, whether measured via cache access profiles, by monitoring branch predictors, or directly via HPCs, can be prevented. The most effective technique is constant-time implementation, where execution behavior must be independent of sensitive inputs. Several tools to validate constant-time properties have been proposed, e.g. Langley’s ctgrind, ct-verif [3] and CacheD [66]. Raccoon [57]

automates the enforcement of a constant-time control flow. Yet, the adoption rate of constant-time implementation is low even for cryptographic libraries, and probably not even considered for other applications. Besides requiring increased development effort, constant-time implementations often have significantly decreased average-case performance. Alternatives that reduce side-channel leakage at lower overheads use sophisticated randomization techniques, e.g. software diversity 

[15], where programs randomly choose from various different implementations of the same functionality or simply access random memory locations during run time [75]. While often more efficient than constant-time code, none of the designs are optimized to minimize the overhead while maximizing the obfuscation effect of the countermeasure.

2.4 Convolutional Neural Networks

Convolutional Neural Network (CNN) is a supervised feed-forward artificial neural network architecture. The supervised learning simply means that the data used to train the model is labeled. Other than labeling, training a CNN requires minimal human intervention and is easily automatable.

The biggest advantage of the CNN is that it does not saturate easily and can reach unprecedented accuracy levels with more training data. Unlike the classical machine learning methods, CNNs do not require data features to be identified and pre-processed before training. Instead, CNNs discover and learn relevant features in the data without human intervention, making them very suitable for automated tasks. The fact that features do not need to be extracted manually offers great flexibility and makes CNNs the go-to classifier for processing large amounts of unlabeled data.

The disadvantage of the CNN on the other hand is the requirement for large amounts of data and the computationally expensive training process compared to the classical ML models. Even so, in the past 5 years, decent number of results show that CNNs can surpass humans in certain tasks which only a decade ago were considered nearly impossible to automate. This breakthrough is fueled by the rapid increase in GPU powered parallel processing power and the advancements in deep learning.

In the past decade, CNNs have been very successfully applied to image, malware and many other classification problems. Training a CNN model is done in 3 phases. First, the labeled dataset is split into three parts; training, validation and test data. Then the training data is fed to the CNN with initial hyper-parameters and the classification accuracy is measured using the validation dataset. Guided by the validation accuracy results, the hyper-parameters are updated to increase the accuracy of the model while maintaining its generality. After the model achieves the desired hyper-parameter optimization level, it is tested with the test data and the final accuracy of the model is obtained.

2.5 Adversarial Learning

AL is a subfield of machine learning that studies the robustness of trained models under adversarial settings. The problem stems from the underlying assumption that the training and the test data comes from the same source are consistent in their features. Studies have shown however that by introducing some small external noise or in this context what is commonly referred to as adversarial perturbations, it is possible to craft adversarial samples and manipulate the output of machine learning models. In other words, by carefully crafting small perturbations, one can push a test sample from the boundaries of one class to another. Moreover, due to the mathematical properties of the high-dimensional space that the classifier operates in, very small perturbations can be enough to push a sample to other classes. AL refers to the group of techniques that are used to perturb test samples to classifiers and force misclassification. While there are many different methods of crafting such perturbations, ideally they are desired to be minimal and not easily detectable.

Adversarial attacks on classical machine learning classifiers (under both white-box and black-box scenarios) have been known for quite some time [40, 36, 30, 9, 76, 10, 7, 8]. However, it was Szegedy et al. [62]

that first introduced AL attacks on DNNs. In the 2013 study, they show that very small perturbations that are indistinguishable to human eye can indeed fool CNN image classifiers like ImageNet. The perturbations in the study are calculated using the technique called Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS). This algorithm searches in the variable space to find parameter vectors (perturbation) that can successfully fool the classifier. Later in 2014, Goodfellow et al. 

[20] improved the attack by using the Fast Gradient Sign Method (FGSM) to efficiently craft minimally different adversarial samples. Unlike the L-BFGS method, the FGSM is computationally conservative and allows much faster perturbation crafting.

In 2016, Papernot et al. [51]

further improved upon Goodfellow’s FGSM by using Jacobian saliency maps to craft adversarial samples. Unlike the previous attacks, the jacobian saliency map attack (JSMA) does not modify randomly selected data points or pixels in an image. Instead, it finds the points of high importance with regards to the classifier decision and then modifies these specific pixels. These points are found by taking the Jacobian matrix of the loss function given a specific input sample. The JSMA allows an attacker to craft adversarial samples by modifying fewer data points in comparison to FGSM.

In 2016 [35, 38, 49], multiple new adversarial attacks were discovered. Moreover, the research showed that these adversarial samples are transferable i.e. perturbations that can fool a model can also work on other models trained on the same task. In [50], Papernot et al. showed that adversarial attacks can also succeed under the black-box attack scenario where an attacker has only access to the classification labels. In this scenario, the attacker has no access to the model parameters such as weights, biases, classification confidence or the loss, therefore, cannot directly compute or use the gradients to craft a perturbation. Instead, the attacker uses the target model as an oracle that labels the inputs and then uses these labeled images to train her own classifier. Authors demonstrated the feasibility of the attack on MetaMind and Deep Neural Network (DNN) classifiers hosted by Amazon and Google. With 84.24, 96.19 and 88.94% misclassification rates respectively, they were able to fool the targeted classifiers.

In [16], researchers have shown that by iteratively morphing a structured input, it is possible to craft adversarial samples under black-box attack scenario. Authors have implemented the attack against a PDF malware classifier and have reported 100% evasion rate. Moreover, the study acknowledges the fact that black-box attack model has a cost of obtaining labeled data from observations and defines and uses a cost function that takes into account the number of observations. The attack works by adding and/or removing compilable objects to the PDF. Black-box scenario does not assume to obtain confidence scores from the model under attack, only the class output.

In summary, the AL is an active research area with plethora of new attacks, defenses and application cases emerging daily [65, 18, 46, 13, 60].

3 Methodology

Figure 1: The outline of the cloaking methodology. Alice, the defender runs the process X and leaks the . Eve, the attacker obtains the leakage and identifies the process using the classifier, C. Then, Alice crafts the adversarial perturbation and forces C to misclassify the trace as . Eve then trains with adversarial re-training and defensive distillation. Now, Eve can classify partially correct. However, when Alice crafts against , X is again misclassified.

Our goal is to show that side channel classifiers—especially complex and powerful DL-based classifiers—can be successfully stopped using the concept of AL. Tho validate this assumption, we first train DL-based classifiers using real side-channel data, and show their degradation as the result of AL techniques, even if the DL-based classifier is aware of the AL based cloaking defense. In our experiments, we take the following steps:

  1. Training the process classifier C using side-channel leakage

  2. Crafting adversarial samples to cloak the user processes and force C to misclassify.

  3. Training a new classifier with adversarial defense methods; Defensive Distillation and Adversarial Re-training.

  4. Testing previously crafted adversarial samples against the new classifier . Also crafting and testing new adversarial samples against the protected classifier .

We outline this methodology in Figure 1. In the first stage, Alice the defender runs a privacy sensitive process X. The eavesdropper Eve collects the side-channel leakage  and feeds it into her classifier C and discovers what type of process X is. Then in stage 2, Alice cloaks her process by crafting the adversarial sample . When faced with this adversarial sample, Eve’s classifier C fails and misclassifies the leakage trace as . In the third stage, Eve trains a new classifier using defensive distillation and adversarial re-training to protect it from misclassification cause by the adversarial perturbation . In the final stage, Alice first tests previously crafted adversarial samples against Eve’s protected classifier . Then, Alice updates her adversarial sample crafting target to fool rather than the original classifier C.

We apply this methodology to a scenario where a malicious party trains a Deep Neural Network (DNN) to classify running processes using the HPC trace as the input. This information is extremely useful to the attacker since it helps to choose a specific attack or even pick a vulnerable target among others. Once a vulnerable target is found, an attacker then can perform microarchitectural or application specific attacks. To circumvent this information leakage and protect processes, the defender attempts to mask the process signature. This masking should not interfere with the running process and should not create too much overhead to the overall system performance. This is why crafting minimal adversarial perturbations is crucial to the practicality of the defense.

The attacker periodically collects five HPC values for 10 milliseconds total with 10-microsecond intervals. This results in a total of 5000 data points per trace. Later, this trace is fed into classical machine learning and DL classifiers. In this section, we explain our choice of the specific HPCs, the application classifier design and implementation details, the AL attacks applied to these classifiers and finally test the efficiency of adversarial defenses against our cloaking method.

3.1 HPC Profiling

HPCs are special purpose registers that provide detailed information on low-level hardware events in computer systems. These counters periodically count specified event like cache accesses, branches, TLB misses and many others. This information is intended to be used by developers and system administrators to monitor and fine-tune performance of applications. The availability of a specific counter depends on the architecture and model of the CPU. Among many available HPCs, we have selected the following 5 for the classification task;

  1. [noitemsep]

  2. Total Instructions: the total number of retired i.e. executed and completed CPU instructions.

  3. Branch Instructions: the number of branch instructions (both taken and not taken).

  4. Total Cache References: the total number of L1, L2, and L3 cache hits and misses.

  5. L1 Instruction Cache Miss: the occurrence of L1 cache instruction cache misses.

  6. L1 Data Cache Miss: the occurrence of L1 cache data cache misses.

We have selected these HPCs to cover a wide variety of hardware events with both coarse and fine-grain information. For instance, the Total Instructions does not directly provide any information about the type of the instructions being executed. However, different instructions execute in varying number of cycles even if the data is loaded from the same cache level. This execution time difference translates indirectly into the total instructions executed in the given time period and hints about the instruction being executed.

The Branch Instructions HPC provides valuable information about the execution flow as well. Whether the branches are taken or not taken, the total number of branches in the executed program remains constant for a given execution path. This constant in the leakage trace helps eliminate noise elements and increases classification accuracy.

The Total Cache References HPC provides similar information to the Branch Instructions HPC in the sense that it does not leak information about the finer details like the specific cache set or even the cache level. However it carries information regarding the total memory access trace of the program. Regardless of the data being loaded from the CPU cache or the memory, the total number of cache references will remain the same for a given process.

The L1 Instruction Cache Miss and the L1 Data Cache Miss HPCs provide fine-grain information about the Cold Start misses on the L1 cache. Since the L1 cache is small, the data in this cache level is constantly replaced with new data, incrementing these counters. Moreover, separate counters for the instruction and the data misses allows the profiler to distinguish between arithmetic and memory intensive operations and increases the profiler accuracy. Finally, all five of the HPCs are interval counters meaning that they count specific hardware events within selected time periods.

3.2 Classifier Design and Implementation

In the first part of the study, we design and implement classifiers that can identify processes using the HPC leakage. To show the viability of such classifier, we chose 20 different ciphers from the OpenSSL 1.1.0 library as the classification target. Note that these classes include ciphers with both very similar and extremely different performance traces e.g. AES-128, ECDSAB571, ECDSAP521, RC2 and RC2-CBC. Moreover, we also trained models to detect the version of the OpenSSL library for a given cipher. For this task, we used OpenSSL versions 0.9.8, 1.0.0, 1.0.1, 1.0.2 and 1.1.0. The full list of classified processes is given in Appendix A.2.

3.2.1 Classical Machine Learning Classifiers:

In this study, we refer to non-neural network classification methods as classical machine learning classifiers. In order to compare and contrast classical machine learning methods with CNNs, we trained a number of different classifiers using the Matlab Classification Learning Toolbox. The trained classifiers include SVMs, decision trees, kNNs and variety of ensemble methods.

3.2.2 Deep Learning Classifier:

We designed and implemented the CNN classifier using Keras with Tensorflow-GPU back-end. The model has the total of 12 layers including the normalization and the dropout layers. In the input layer, the first convolution layer, there are a total of 5000 neurons to accommodate the 10 milliseconds of leakage data with 5000 HPC data points. Since the network is moderately deep but extremely wide, we used 2 convolution and 2 MaxPool layers to reduce the number dimensions and extract meaningful feature representations from the raw trace.

In addition to convolution and MaxPool layers, we used batch normalization layers to normalize the data from different HPC traces. This is a crucial step since the hardware leakage trace is heavily dependent on the system load and scales with overall performance. Due to this dependency, the average execution time of a process or parts of a process can vary from one execution to another. Moreover, in the system-wide leakage collection scenario, the model would train over this system load when it should be treated as noise. If not handled properly, the noise and the shifts in the time domain results in overfitting the training data with the dominant average execution time, decreasing the classification rate. By using the batch normalization layer, the model learns the features within short time intervals and the relation between different HPC traces. On the output layer we use 20 neurons with softmax activation, representing 20 classes of processes. Finally, we use 

Categorical Cross-entropy loss function with the Adam Optimizer to train the model.

Our CNN classifier is constructed using the layers given below;

  1. [noitemsep]

  2. Convolution layer (50, (10,1))

  3. MaxPool Layer (10,1)

  4. Batch Normalization Layer

  5. Dropout Layer (0.25)

  6. Convolution Layer (100, (10,1))

  7. MaxPool Layer (10,1)

  8. Batch Normalization Layer

  9. Dropout Layer (0.25)

  10. Flatten Layer

  11. Dense Layer (400)

  12. Dropout Layer (0.25)

  13. Dense Layer (20)

3.3 Adversarial Learning Attacks

AL remains an important open research problem in AI. Traditionally, AL is used to fool AI classifiers and test model robustness against malicious inputs. In this study however, we propose to use AL as a defensive tool to mask the side-channel trace of applications and protect against microarchitectural attacks and privacy violations. In the following, we explain the specific adversarial attacks that we have used. We consider the following 10 attacks:

  • [leftmargin=*]

  • Additive Gaussian Noise Attack (AGNA):

    Adds Gaussian Noise to the input trace to cause misclassification. The standard deviation of the noise is increased until the misclassification criteria is met. This AL attack method is ideal to be used in the cloaking defense due to the ease of implementation of the all-additive perturbations. A sister-process can actuate such additional changes in the side-channel trace by simply performing operations that increment specific counters like cache accesses or branch instructions.

  • Additive Uniform Noise Attack (AUNA): Adds uniform noise to the input trace. The standard deviation of the noise is increased until the misclassification criteria is met. Like the AGNA, this attack is also easy to implement as a sister-process due to its additive property.

  • Blended Uniform Noise Attack (BUNA): Blends the input trace with Uniform Noise until the misclassification criteria is met.

  • Contrast Reduction Attack (CRA): Calculates perturbations by reducing the ‘contrast’ of the input trace until a misclassification occurs. In case of the side-channel leakage trace, the attack smooths parts of the original trace and reduces the distance between the minimum and the maximum data points.

  • Gradient Attack (GA): Creates a perturbation with the loss gradient with regards to the input trace. The magnitude of the added gradient is increased until the misclassification criteria is met. The attack only works when the model has a gradient.

  • Gaussian Blur Attack (GBA):

    Adds Gaussian Blur to the input trace until a misclassification occurs. Gaussian blur smooths the input trace and reduces the amplitude of outliers. Moreover, this method reduces the resolution of the trace and cloaks the fine-grain leakage.

  • GSA (Gradient Sign Attack) [20]: Also called the Fast Gradient Sign Method, the attack has been proposed by Goodfellow et al. in 2014. GSA works by adding the sign of the elements of the gradient of the cost function with regards to the input trace. The gradient sign is then multiplied with a small constant that is increased until a misclassification occurs.

  • L-BFGS-B Attack (LBFGSA) [63]: The attack utilizes the modified Broyden-Fletcher-Goldfarb-Shanno algorithm, an iterative method for solving unconstrained nonlinear optimization problems, to craft perturbations that have minimal distance to the original trace. The attack morphs the input to a specific class. However, in our experiments, we did not target a specific class and chose random classes as the target.

  • Saliency Map Attack (SMA) [53]: The SMA works by calculating the forward derivative of the model to build an adversarial saliency map. This map reveals which input features e.g. pixels in an image, have a stronger effect on the targeted misclassification. Using this information, an adversary can modify only the features with high impact on the output and produce minimal perturbations.

  • Salt and Pepper Noise Attack (SPNA): Works by adding Salt and Pepper noise (also called impulse noise) to the input trace until a misclassification occurs. For images, salt and pepper values correspond to white and black pixels respectively. For the side-channel leakage trace however, these values correspond to the upper and the lower bounds in the trace.

3.4 Adversarial Learning Defenses

In order to see the viability of any defense method that can be used by an attacker against our adversarial perturbations, we have explored two methods: adversarial re-training and defensive distillation (DD). These defenses are an integral part of this study since an attacker capable of overcoming adversarial perturbation would deem any cloaking mechanism moot.

3.4.1 Gradient Masking:

The term gradient masking defense has been introduced in [50] to represent group of defense methods against adversarial samples. The defense works by hiding the gradient information from the attacker to prevent it from crafting adversarial samples. Papernot et al. [50] however showed that the method fail under the oracle access scenario. An attacker can query the classifier with enough samples to create a cloned classifier. Since the clone and the original classifiers have correlated gradients, the attacker can use the gradient from the clone and craft adversarial samples, bypassing the defense. Due to the known weaknesses and limitations of this defense method, we do not further investigate it in this study.

3.4.2 Adversarial Re-training:

This defense idea was first proposed by Szegedy et al. in 2013 [62]. Later in 2014, Goodfellow et al. [20] improved the practicality of the method by showing how to craft adversarial samples efficiently using the Fast Gradient Sign Method. In this defense, the model is re-trained using adversarial samples. By doing so, the model is ‘vaccinated’ against adversarial perturbations and can correctly classify them. In other words, the method aims to teach adversarial perturbations to the model so that it can generalize better and not be fooled by small perturbations. While this method works successfully against a specific type of attack, it has been shown to fail against attack methods that the model was not trained for. Nevertheless, we apply this defense method to our classifiers and investigate its applicability to side-channel leakage classifiers.

3.4.3 Defensive Distillation:

The DD has been proposed by Papernot et al. [49] in 2016 to protect DL models against AL attacks. The goal of this technique is to increase the entropy of the prediction vector to protect the model from being easily fooled. The method works by pre-training a model with a custom output layer. Normally, the softmax temperature is set to be as small as possible to train a tightly fitted, highly accurate model. In the custom layer however, the temperature value is set to a higher value to distill the probability outputs. The first model is trained with the training data using hard labels i.e. the correct class label is set to ‘1’ and all other class labels are set to ‘0’. After the model is trained, the training samples are fed into it and the probability outputs are recorded as soft labels. Then these soft labels are used to train the second, distilled model with the same training data. This process smooths the model surface on directions that an adversary would use to craft perturbations. This smoothing process increases the perturbation size required to craft an adversarial samples and invalidates some of the previously crafted adversarial samples. This smoothing can be set to different levels by adjusting the temperature value. Note that however, the DD can reduce the classification accuracy significantly if the temperature value is set too high.

4 Experiment Setup and Results

In this section, we give details of our experiment setup, and the results of different adversarial attacks on the crypto-process classifier, and finally present the results of hardened adversarially re-trained and distilled models.

4.0.1 Experiment Setup:

DL models perform mammoth amounts of matrix multiplications that can be parallelized well in modern GPU systems. For that reason, we used a workstation with two Nvidia 1080Ti (Pascal architecture) GPUs, 20-core Intel i7-7900X CPU and 64 GB of RAM. On the software side, the classifier model is coded using Keras v2.1.3 with Tensorflow-GPU v1.4.1 back-end and other Python3 packages such as Numpy v1.14.0, Pandas, Sci-kit, H5py etc.

The HPC data is collected from a server with Intel Xeon E5-2670 v2 CPU running Ubuntu 16 LTS. This specific CPU model has 85 possible hardware events of which 50 are available to user-space. To access the HPCs, we had the choice of using Perf and PAPI libraries. Due to the lower sampling rate of Perf, we chose to use the PAPI with QuickHPC [14] front-end. QuickHPC is a tool developed by Marco Chiappetta to collect high-resolution HPC data using the PAPI back-end. It is over 30000 times faster than perf-stat and provides an easy to use interface.

4.1 Classification Results

The classifiers are trained to identify 20 classes representing a diverse set of different ciphers of five different versions of OpenSSL, as detailed in Section 3.2.

4.1.1 CNN Classifier:

For the CNN classifier, we firstly investigated the effect of the number of HPCs collected and trained our models for 100 epochs with data from a varying number of HPCs. Not surprisingly, even with only 1 HPC, our CNN classifier achieved 81% validation accuracy, although after a high number of epochs. Moreover, we noticed that after the 30th epoch, the model overfitted the training data i.e. the validation accuracy started to drop while the training accuracy kept increasing. When we increased the number of HPCs collected, our models became much more accurate and achieved over 99% validation accuracy as seen in Figure 

3. Moreover, when we use the data from all 5 HPCs, our model achieved 99.8% validation accuracy in less than 20 epochs. While our validation accuracy saturates even with only 2 HPCs, Total Instructions and Branch Instructions we have decided to use all 5 of them. We made this decision because in a real-world attack scenario, an attacker might be using any one or more of the HPCs. Since it would not be known which specific hardware event(s) an attacker would monitor, we decided to use all 5, monitoring different low level hardware events to provide a comprehensive cloaking coverage.

Figure 2: Results for the CNN classifier trained using varying number of features. Models reach highest validation accuracy with 1000 and 2000 features.

To find the optimum number of features per HPC, we have trained multiple models with using various number of features. As shown in Figure 2, the validation accuracy saturates at 1000 and 2000 features, validation loss drops after 1000 features. For this reason, we chose to use 1000 features for our experiments.

Figure 3: Results of CNN classifiers trained with varying number of HPCs. Even using data from a single HPC trace is enough to obtain high accuracy top-1 classification rates, albeit taking longer to train.

After deciding to use data from 5 HPCs, we investigated how the number of training samples affect the CNN classifier validation accuracy. For that, we have trained 6 models with a varying number of training samples. For the first model, we have used only 100 samples per class (2000 samples in total) and later on trained models with 300, 1000, 3000, 10000 and 30000 samples per class. In the first model, we achieved 99.8% validation accuracy after 40 epochs of training. When we trained models with more data, we have reached similar accuracy levels in much fewer epochs. To make a good trade-off between the dataset size and training time, we have opted to use 1000 samples per class. This model reaches 100% accuracy with 20 epochs of training as shown in Figure 4. Finally, our last model achieved 100% accuracy just after 4 epochs when trained with 30000 samples per class. Additional results on varying number samples are presented in Appendix 6.

Figure 4: Results of CNN classifier trained using 100 and 30000 samples per class, in order. First, the model is trained with 100 samples per class for about 40 epochs to reach 99% accuracy. When we increase the number of samples per class to 1000, we achieve same accuracy in about 20 epochs of training.

We also verified that different versions of OpenSSL can be distinguished for each cipher. For each of the 20 analyzed ciphers, we built classifiers to identify to which of the five analyzed versions they belong. Figure 5 presents the classification results of two models trained using 1 and 5 HPC traces respectively. More results with 2,3 and 4 HPCs are presented in Appendix A.3. As cipher updates between versions can be very small, the added information from sampling several HPCs is essential for high classification rates, as can be seen from the results.

Figure 5: Results of the CNN trained for OpenSSL version detection, using varying number of HPCs. When trained with only a single HPC trace, the validation accuracy the model saturates at 61%. When the data from 5 HPCs are used, validation accuracy reaches 99%.
Classification Method Without With PCA
PCA

(99.5% variance)

Fine Tree 98.7 99.9
Medium Tree 85.4 94.8
Coarse Tree 24.9 25
Linear Discriminant 99.6 99.7
Quadratic Discriminant N/A 99.4
Linear SVM 99.9 98.2
Quadratic SVM 99.9 96.9
Cubic SVM 99.9 94.3
Fine Gaussian SVM 40 88.2
Medium Gaussian SVM 98.3 92.1
Coarse Gaussian SVM 99.7 13.4
Fine kNN 96.8 11.1
Medium kNN 94.9 7.8
Coarse kNN 85.5 5.2
Cosine kNN 92.5 19.6
Cubic kNN 85.2 7.7
Weighted kNN 95.9 8.3
Boosted Trees 99.2 99.8
Bagged Trees 99.9 94.8
Subspace Discriminant 99.8 99.7
Subspace kNN 84.8 88.1
RUSBoosted Trees 76 92.8
Best 99.9 99.9
Table 1:

Application classification results for the classical machine learning classifiers with and without PCA feature reduction.

4.1.2 Classical Machine Learning Methods:

In our training of ML classifiers, the first challenge was the fact that the side-channel leakage data is extremely wide. We have chosen to train our models using 1000 data points per HPC with 5 HPCs monitored simultaneously. This parameter selection is done empirically to provide wide cloaking coverage and train highly accurate models as explained in Section 4.1.1. Using 1000 data points with 10 micro-second intervals per HPC allowed us to obtain high quality data in a short observation window. Nevertheless, 5000 dimensions is unusually high for classifiers, especially considering that we are training multi-class classifiers with 20 possible classes.

In order to find optimal settings for the hardware leakage trace, we tried different parameters with each classifier. For instance in the case of decision trees, we have trained the ‘Fine Tree’, ‘Medium Tree’ and ‘Coarse Tree’ classifiers. The difference between these classifiers is that respectively they allow 5, 20, and 100 splits (leaves in the decision tree) to better distinguish between classes. For the case of Gaussian SVM, fine, medium and coarse refers to the kernel scale set to sqrt(P)/4, sqrt(P) and sqrt(P)*4 respectively. As for the kNN, the parameters refer to the number of neighbors and the different distance metrics. More detailed description of these classifiers is provided in Appendix A.1.

Results for the classical ML classifiers are given in Table 1. Classic machine learning algorithms achieve very high success rates for the given classification task. The trained models can in fact classify running processes by using their HPC traces. Note that the Quadratic Discriminant did not converge to a solution without the PCA application hence no score is given in the table.

4.2 AL on the Unprotected Model

Next, we crafted adversarial perturbations for the unprotected classifiers by using and adapting the publicly available Foolbox [58] library to our scenario. The library provides numerous adversarial attacks and provides an easy to use API. For a selected attack, Foolbox crafts necessary perturbations on a given sample and classifier model pair to ‘fool’ the given model. Detailed information about these attacks can be found in Section 3.3.

Table 3 presents the classification accuracy of perturbed samples. As the results show, almost all test samples are misclassified by the classifier model with very high accuracy at over 86%. Another important metric for the adversarial learning is the Mean Absolute Distance (MAD) and the Mean Squared Distance (MSD) of the perturbed traces from the originals. These metrics quantify the size of the changes i.e. perturbations made to the original traces by various adversarial attack methods. The difference between the MAD and the MSD is that, the latter is more sensitive to the larger changes due to the square operation. For instance, if an adversarial perturbation requires a significant change in 1 sample point among the 5000 features, it will have a stronger impact in the final MSD value than average change distributed over few points. MAD however is more dependent on the overall change in the trace, i.e. all 5000 sample points have the same impact on the final distance. Results show that with most adversarial attacks, perturbation MAD is around or well below 1% and within the ideal range. As noted earlier, the smaller the perturbation, easier it is to actuate it.

Adversarial
Attack
Adversarial
Re-training
DD with
T=1
DD with
T=2
DD with
T=5
DD with
T=10
DD with
T=20
DD with
T=30
DD with
T=40
DD with
T=50
DD with
T=100
AGNA 42 77 60 70 83 83 63 64 62 75
AUNA 43 77 60 70 83 82 63 65 61 75
BUNA 94 92 92 91 94 94 94 96 94 100
CRA 94 95 99 94 94 94 94 99 88 95
GA 97 99 72 83 99 99 96 80 90 90
GBA 84 83 84 88 82 94 93 91 93 88
GSA 99 91 90 91 99 99 99 95 99 98
LBFGSA 51 76 63 63 78 87 65 65 63 71
SMA 26 71 50 52 62 82 49 47 32 48
SPNA 4 84 76 78 94 93 76 79 73 80
Table 2: Effectiveness of adversarial re-training and DD on 100,000 adversarial samples. The results show what percentage of previously successful adversarial samples are ineffective on the hardened models. Adversarial re-training is most effective against the GSA and least effective on SPNA. As for the DD, the protection changes with the distillation temperature and ranges between 32% and 99%.
Unprotected Classifier Hardened Classifier (Adv. Re-training)
Adversarial
Attack
Original Sample
Classification
Confidence
Adversarial
MisClassification
Confidence
Perturbation
Size (MAD)
Original Sample
Classification
Confidence
Adversarial
MisClassification
Confidence
Perturbation
Size (MAD)
AGNA 99 96 0.00294 92 82 0.00294
AUNA 99 97 0.00292 91 82 0.00332
BUNA 99 99 0.05000 96 93 0.05000
CRA 99 99 0.05254 97 98 0.04999
GA 99 99 0.00250 88 97 0.00398
GBA 99 97 0.00080 93 74 0.00071
GSA 99 99 0.00499 89 97 0.00596
LBFGSA 99 86 0.00025 89 72 0.00031
SMA 99 92 0.00001 88 63 0.00008
SPNA 99 96 0.01528 92 74 0.08268
Table 3: Perturbation results against both the unprotected and the hardened (Adversarial Re-training) CNN classifier. The new adversarial samples have up to 29% lower misclassification confidence compared to the unprotected model. However, new adversarial samples are still misclassified with quite high confidence values in the range of 63-98%.
Adversarial
Attack
Unprotected
Model
Adversarial
Re-trained
DD with
T=1
DD with
T=2
DD with
T=5
DD with
T=10
DD with
T=20
DD with
T=30
DD with
T=40
DD with
T=50
DD with
T=100
AGNA 0.00294 0.00294 0.00035 0.00265 0.00062 0.01059 0.01389 0.00452 0.00766 0.01797 0.00431
AUNA 0.00292 0.00332 0.00033 0.00238 0.00071 0.01114 0.01336 0.00514 0.00948 0.01787 0.00451
BUNA 0.05000 0.05000 0.04989 0.05301 0.08198 0.10293 7.52282 0.05006 0.05002 0.16376 0.07748
CRA 0.05254 0.04999 0.04999 0.10247 0.08548 0.10780 NA 0.04998 1.44158 0.27992 0.07448
GA 0.00250 0.00398 0.00355 0.00283 0.10224 0.00783 0.00364 0.00860 2.75805 0.00849 0.00303
GBA 0.00080 0.00071 0.00058 0.00092 0.00060 0.04992 NA 0.00169 0.00046 0.00062 0.00083
GSA 0.00499 0.00596 0.00504 0.00534 0.13665 0.02482 0.00540 0.01064 0.04121 0.01670 0.00550
LBFGSA 0.00025 0.00031 0.00533 0.00178 0.00024 0.00210 0.07670 0.00088 0.02872 0.00521 0.00920
SMA 0.00001 0.00008 0.00003 0.00012 0.00007 NA NA 0.00001 0.00001 NA NA
SPNA 0.01528 0.08268 0.01060 0.00940 0.00360 0.01804 0.02000 0.00260 0.02000 0.02000 0.01980
Table 4: MADs of adversarial perturbations crafted against the DD applied classifier. The application of adversarial re-training and DD only marginally increases the perturbation size needed to fool the classifier in most cases. In almost all cases, adversarial samples with MAD of less than 1% is enough to fool the model, with the exception three.

4.3 AL on the Hardened Models

Here we present the results of the AL attacks on the hardened classifier models. As explained in Section 3, we wanted to test the robustness of our cloaking mechanism against classifiers hardened with Adversarial Re-training and DD. To recap the scenario, Alice the defender wants to cloak her process by adding perturbations to her execution trace so that eavesdropper Eve cannot correctly classify what Alice is running. Then Eve notices or predicts the use of adversarial perturbations on the data and hardens her classifier model against AL attacks using adversarial re-training and DD.

In order to test the attack scenario on hardened models, we first craft 100,000 adversarial samples per adversarial attack type against the unprotected classifier. Then, we harden the classifier with the aforementioned defense methods and feed the previously successful adversarial samples. Here, we aim to measure the level of protection provided by the adversarial re-training and the DD methods.

As presented in Table 2, the application of both the adversarial re-training and the DD invalidates some portion of the previously crafted adversarial samples. For the adversarial re-training, the success rate varies between 99% (GSA) and 4% (SPNA). In other words, 99% of the adversarial samples crafted using GSA against the unprotected model are invalid on the hardened model. As for the DD, we see similar rates of protection ranging from 61% up to 100% for old perturbations. Impressively, 100% of the adversarial samples crafted using the BUNA are now ineffective against the model trained with DD at temperature T=100.

In short, by using the adversarial re-training or the DD, Eve can indeed harden her classifier against AL. However, keep in mind that Alice can observe or predict this behavior and introduce new adversarial samples targeting the hardened models. Below, we discuss the results of our experiments against such hardened models.

4.3.1 Adversarial Re-training:

After training the DL classifier model and crafting adversarial samples, we use these perturbations as training data and re-train the classifier. The motivation here is to teach the classifier model to detect these perturbed samples and correctly classify them. With this re-training stage, we expect to see whether we can ‘immunize’ the classifier model against given adversarial attacks. However, as the results in Table 3 show, all of the adversarial attacks still succeed albeit requiring marginally larger perturbations. Moreover, while we observe a drop in the misclassification confidence, it is still quite high at over 63% i.e. Eve’s classifier can be fooled by adversarial samples.

4.3.2 Defensive Distillation:

We have used the technique proposed in [52] and trained hardened models with DD at various temperatures ranging from 1 to 100. Our results show that, even if the eavesdropper Eve hardens her model with DD, the trained model is still prone to adversarial attacks albeit requiring larger perturbations in some cases. In Table 4, we present the MAD i.e. the perturbation size, of various attack methods on both unprotected and hardened models. Our results show that the application of DD indeed offers a certain level of hardening to the model and increases the perturbation sizes. However this behavior is erratic compared to the adversarial re-training defense i.e. the MAD is significantly higher at some temperatures while much smaller for others. For instance, the MAD for the AUNA perturbations against the unprotected model is 0.00292 in average for 100,000 adversarial samples. The perturbation size for the same attack drops to 0.00033 when the distillation defense is applied with temperature T=1. This in turn practically makes Eve’s classifier model easier to fool. Same behavior is observed with the adversarial samples crafted using AGNA and GBA as well. For the cases that the DD actually hardens Eve’s model against adversarial samples, the MAD is still minimal. Hence, Alice can still successfully craft adversarial samples with minimal perturbations and fool Eve’s model. Finally, NA values in Table 4 represent cases where the model that had very low classification accuracy and could not correctly classify original samples.

5 Conclusion

Side-channel leakage on shared hardware systems pose a real and present danger to the security and the privacy of users. Even when the software is perfectly isolated, co-resident tenants still share the underlying hardware and are prone to side-channel attacks. Especially considering the wide adoption of AI across many disciplines, it is not surprising that such attacks will become automated and even easier to perform in the future. There is a clear need for users to cloak their execution fingerprints from the underlying shared system.

With this work we took a first step in this direction. Specifically, by making clever defensive use of adversarial crafting we introduced a new cloaking defense against the side-channel leakage classifiers. We first demonstrated the threat side-channel leakage poses by processing leakage profiles to yield highly accurate AI models which may be used by an adversary to violate privacy and security policies of applications. We trained various types of classifiers including the classical machine learning methods and showed how the parameter selection affects the learning rate and the validation accuracy. While this is a strong threat to shared hardware systems, we showed that it can be mitigated using carefully crafted adversarial samples. Moreover, we investigated defenses that can potentially help an attack to bypass the adversarial samples. Our results show that even in the presence of defensive distillation and adversarial re-training, the defender can craft working adversarial samples and fool the attacker. These perturbations can be implemented as a sister-process that will run side-by-side with the original and easily cause misclassification to the attacker’s model without any significant overhead.

If systems deploy the adversarial crafting-based cloaking mechanism that we have outlined in this work, users can enable such services on-demand for sensitive operations. Efficient design and implementation of such defenses for shared hardware systems like cloud remains an open research problem. Finally, to the best of our knowledge, this work is the first use of adversarial crafting for defensive purposes. We envision the same approach to be useful in other application scenarios.

References

Appendix A Appendix

a.1 Classical Machine Learning Methods

The following is the full list of the classical machine learning methods used in the study.

  • [noitemsep]

  • Fine Tree

  • Medium Tree

  • Coarse Tree

  • Linear Discriminant

  • Quadratic Discriminant

  • Linear SVM

  • Quadratic SVM

  • Cubic SVM

  • Fine Gaussian SVM

  • Medium Gaussian SVM

  • Coarse Gaussian SVM

  • Fine kNN

  • Medium kNN

  • Coarse kNN

  • Cosine kNN

  • Cubic kNN

  • Weighted kNN

  • Boosted Trees

  • Bagged Trees

  • Subspace Discriminant

  • Subspace kNN

  • RUSBoosted Trees

The following descriptions about the classifiers are taken from Mathworks webpage [45].

Decision Tree: are easy to interpret, fast for fitting and prediction, and low on memory usage, but they can have low predictive accuracy. Try to grow simpler trees to prevent overfitting. Control the depth with the Maximum number of splits setting.

Discriminant Analysis:

is a popular first classification algorithm to try because it is fast, accurate and easy to interpret. Discriminant analysis is good for wide datasets. Discriminant analysis assumes that different classes generate data based on different Gaussian distributions. To train a classifier, the fitting function estimates the parameters of a Gaussian distribution for each class.

SVM:

classifies data by finding the best hyperplane that separates data points of one class from those of the other class. The best hyperplane for an SVM means the one with the largest margin between the two classes. Margin means the maximal width of the slab parallel to the hyperplane that has no interior data points.

kNN: typically have good predictive accuracy in low dimensions, but might not in high dimensions. They have high memory usage, and are not easy to interpret. kNN classification is basically categorizing query points based on their distance to points (or neighbors) in a training dataset can be a simple yet effective way of classifying new points. You can use various metrics to determine the distance. Given a set X of n points and a distance function, k-nearest neighbor (kNN) search lets you find the k closest points in X to a query point or set of points. kNN-based algorithms are widely used as benchmark machine learning rules.

Ensemble classifiers: Combine results from many weak classifiers into one high-quality ensemble model. Qualities depend on the choice of algorithm. All ensemble classifiers tend to be slow to fit because they often need many learners.

a.2 List of the Profiled Applications

The following is the full list of applications used as the test classes in our experiments.

  1. [noitemsep]

  2. AES-128-CBC

  3. BF-CBC

  4. BLOWFISH

  5. CAMELLIA-128-CBC

  6. DES-CBC

  7. DES-EDE3

  8. DSA2048

  9. ECDH

  10. ECDHB571

  11. ECDHK571

  12. ECDHP521

  13. ECDSA

  14. ECDSAB571

  15. ECDSAP521

  16. HMAC

  17. MD4

  18. MD5

  19. RC2

  20. RC2-CBC

  21. RC4

a.3 CNN Classifier Results

Here we present the training and validations results of the attacker Eve’s application classifier. As seen in the Figure 6, when the number of samples per class is increased gradually from 100 to 30000, validation accuracy saturates in fewer epochs.

Figure 6: Results of the CNN classifier trained using varying number of samples per class, 100, 300, 1000, 3000 and 10000 in order. As the number of samples increase, number of epochs to achieve high accuracy decreases.
Figure 7: Results of the CNN classifier trained for OpenSSL version detection using varying number of HPCs. When trained with only a single HPC trace, the validation accuracy saturates at 81%. When more HPCs are used, validation accuracy increases to 99%.