Detecting Covert Cryptomining using HPC

08/31/2019 ∙ by Mauro Conti, et al. ∙ Università di Padova 0

Cybercriminals have been exploiting cryptocurrencies to commit various unique financial frauds. Covert cryptomining - which is defined as an unauthorized harnessing of victims' computational resources to mine cryptocurrencies - is one of the prevalent ways nowadays used by cybercriminals to earn financial benefits. Such exploitation of resources causes financial losses to the victims. In this paper, we present our novel and efficient approach to detect covert cryptomining. Our solution is a generic solution that, unlike currently available solutions to detect covert cryptomining, is not tailored to a specific cryptocurrency or a particular form of cryptomining. In particular, we focus on the core mining algorithms and utilize Hardware Performance Counters (HPC) to create clean signatures that grasp the execution pattern of these algorithms on a processor. We built a complete implementation of our solution employing advanced machine learning techniques. We evaluated our methodology on two different processors through an exhaustive set of experiments. In our experiments, we considered all the cryptocurrencies mined by the top-10 mining pools, which collectively represent the largest share (84 the cryptomining market. Our results show that our classifier can achieve a near-perfect classification with samples of length as low as five seconds. Due to its robust and practical design, our solution can even adapt to zero-day cryptocurrencies. Finally, we believe our solution is scalable and can be deployed to tackle the uprising problem of covert cryptomining.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Cryptomining, or simply mining, is a process of validating and adding new transaction in the blockchain digital ledger for various cryptocurrency. It is an essential process to keep most of the cryptocurrencies running. Typically, mining is a resource-intensive process that continuously performs heavy computations. Upon successful mining, miners receive newly generated cryptocoins as their remuneration. Usually, newer cryptocurrencies tend to pay a higher reward. Some cryptocurrencies, such as Monero, make mining feasible on the web-browsers that enable even layman users to participate in mining.

After the success of Bitcoin [nakamoto2008bitcoin], several alternative cryptocurrencies (altcoins) have been introduced to the market. At the time of writing, there are over 2000 active cryptocurrencies [coinmarketcap]. The massive number of cryptocurrencies raises an enormous demand for mining. This demand continues to remain huge because mining, as mentioned before, is an inevitable operation to keep these virtual currency systems running. Such an immense demand for mining has attracted cybercriminals [h1, h2] to earn financial gains, who have already been exploiting cryptocurrencies to perform several types of financial crimes, e.g., ransomware [conti2018economic].

Motivation: A genuine miner has to make an investment in hardware and bear the significant cost of electricity to run the mining hardware as well as cooling facilities [a1]. Nevertheless, mining is not beneficial on personal expenditure (mainly, on electricity) unless mining is performed with specialized hardware [a2]. However, mining can be very profitable if it is performed with “stolen” resources, e.g., through covert cryptomining, or simply cryptojacking. Cryptojacking is defined as an unauthorized use of the computing resources on a computer, tablet, mobile phone, or connected home device to mine cryptocurrencies.

Cybercriminals have made several ingenious attempts to spread cryptojackers in the form of malware [webcobra], malicious browser extensions [facexworm], etc. by exploiting vulnerability [rtorrent], compromising third-party plug-ins [library], maneuvering misconfigurations [misconfiguration], taking advantage of web-based hosting service [github], and so on. To evade intrinsic detection techniques (e.g., processor’s usage), some cryptojackers suspend their execution when the victim is using the computer [minergate], use “pop-under” windows to keep mining for a comparatively longer duration [pop_under], and utilize legitimate processes of the operating system to mine [badshell]. Moreover, merely monitoring CPU load, etc. is an ineffective strategy because of both false positives and false negatives [konoth2018minesweeper].

To further aggravate the situation, cryptocurrency mining service (e.g., Coinhive [coinhive], Crypto-Loot [cryptoloot]) easily integrate into websites to monetize the computational power of their visitors. In fact, cryptojacking attacks exceeded ransomware attacks in 2018 and affected five times more systems as compared to ransomware [cryptojacking_compare]. Such exploitation of the computational resources causes financial damage - primarily in the form of increased111A machine consistently performs heavy computations while it does cryptomining, which, in turn, continuously draws electricity. electricity bills - to the victims, who often discover the misuse when the damage has already been done. Additionally, prolonged mining on an incompatible device may also damage the hardware [and1].

On another side, the current state of cryptomining has been consuming a vast amount of energy. As a representative example, Bitcoin Energy Consumption Index was created to provide insight into this amount with respect to Bitcoin, Bitcoin network consumes electricity close to the total demand by Iraq, and a single Bitcoin transaction requires nearly 2.7 times the electrical energy consumed by 100,000 transactions on the VISA network [beci]. Moreover, a recent study [mora2018bitcoin] has suggested that “Bitcoin usage could alone produce enough emissions to push warming above within less than three decades.” The current situation would further worsen with illegal/unauthorized/covert cryptomining.

Finally, the abundance of the active cryptocurrencies raises the demand for a generic solution to detect covert cryptomining that does not focus on a particular cryptocurrency.

() Number of L1-dcache-loads
(a) Number of L1-dcache-loads-misses
(b) Number of instructions
(c) Number of dTLB-loads
(d) Number of cache-misses
(e) Number of context-switches
Fig. 1: A representative example of variation in events while mining different cryptocurrencies and performing some common user-tasks. HPC were polled every 100ms. The line-points in the graphs do not represent data points and are merely used to make lines distinguishable.

Contribution: In this paper, we focus on detecting covert cryptomining. The major contributions of this paper are as follows:

  1. We propose a novel and efficient approach to detect covert cryptomining. In particular, our approach uses HPC to profile the core of the mining process, i.e., the mining algorithms, on a given processor to accurately identify cryptomining in real-time. We designed our solution to be a generic one, i.e., it is not tailored to a particular cryptocurrency or a specific form (e.g., browser-based) of cryptomining.

  2. We exhaustively assess the quality of our proposed approach. To this end, we designed six different experiments: (1) binary classification; (2) currency classification; (3) nested classification; (4) sample length; (5) feature relevance; and (6) unseen miner programs. For a thorough evaluation of our proposed solution, we considered eleven distinct cryptocurrencies in our experiments. Our results show that our classifier can accurately classify cryptomining activities.

  3. In the spirit of reproducible research, we make our collected datasets and the code publicly available222spritz.math.unipd.it/projects/cryptojackers/.

Organization: The remainder of this paper is organized as follows. Section II presents a summary of the related works. We explain our system’s architecture in Section III and discuss its evaluation in Section IV. Section V addresses the potential limitations of our proposed solution. Finally, Section VI concludes the paper.

Ii Related works

HPC are special-purpose registers in modern microprocessors that count and store hardware-related activities. These activities are commonly referred to as hardware events333Formally, an event is defined as a countable activity, action, or occurrence on a device.. HPC are often used to conduct low-level performance analysis and tuning. HPC-based monitoring has very low-performance overhead, which makes it suitable even for latency-sensitive systems. Several works have shown the effectiveness of using HPC to detect generic malware [demme2013feasibility, yuan2011security, Wang:HPC], kernel-level rootkits [wang2013numchecker], side-channel attacks [chiappetta2016real], unauthorized firmware modifications [wang2015confirm], etc.

A general-purpose process classification may distinguish a browser application from a media player or one browser application from another browser application. In the former case, the nature of the applications is different while both the applications in the latter case have the same nature and perform the same operation of rendering pages. Cryptominers have the same nature (of mining), but they essentially perform very different underlying operations due to different proof-of-works, and they also require different compute resources (e.g., BTC mining is processor-oriented while XMR mining is memory-oriented). Hence, a comparison of our work with the general-purpose process classification methods falls out of the scope of this paper.

On another side, there are limited number of works on detecting cryptomining. Bonneau et al. [bonneau2015sok] discuss open research challenges of various cryptocurrencies and their mining. Huang et al. [huang2014botcoin] present a systematic study of Bitcoin mining malware and have shown that modern botnets tend to do illegal cryptomining. Eskandari et al. [eskandari2018first] present a survey of in-browser cryptomining. Other works [konoth2018minesweeper, liu2018novel, wang18esorics, rauchberger2018other, ruth2018digging] focus particularly on browser-based mining. However, only a limited number of cryptocurrencies can be mined in the web-browsers. MineGuard [tahir2017mining] focuses on detecting cryptomining operations in the cloud infrastructure.

Our work is different from the state-of-the-art on the following dimensions: (1) our proposed solution is a generic solution that is not tailored to a particular cryptocurrency or a specific form (e.g., browser-based) of cryptomining on computers; and (2) we tested our solution against all the cryptocurrencies mined by the top-10 mining pools, which collectively represent the largest portion of the cryptomining business.

Iii System architecture

We elucidate the key concept behind our approach in Section III-A, our data collection phase in Section III-B, selection of cryptocurrencies Section III-C, and our classifier’s design in Section III-D.

Iii-a Fundamental intuition of our approach

The task of cryptomining requires a miner to run the core Proof-of-Work (PoW444We use the term “PoW” to represent different consensus algorithms.) algorithm repetitively to solve the cryptographic puzzle. At a coarse-grained level, some PoW algorithms are processor-oriented (e.g., BTC) while some are memory-oriented (e.g., XMR) due to their underlying design. At a fine-grained level, each PoW algorithm has its own unique mathematical/logical computations (or, in other words, the sequence of operations). Thus, each algorithm upon execution affects some specific events more as compared to other events on the processor. Consequently, when an algorithm is executed several times repetitively, the “more” affected events outnumber the other - relatively under affected - events. It means that a discernible signature can be built using the relevant events for a PoW algorithm. As a representative example, Fig. 1 depicts the variation in events while mining different cryptocurrencies and performing some common user-tasks. LTC, for instance, shows a more erratic pattern in cache-misses as compared to the other events that are affected during LTC mining. On the other hand, a Skype video call has more disparity in context-switches.

In practice, there is a finite number of PoW algorithms upon which the cryptocurrencies are established. Thus, we concentrate on the mining algorithms instead of individual currency in our solution. The primary advantage of our approach is that the signature built for an algorithm would be able to identify even polymorphic, metamorphic, and heavily obfuscated implementations of that algorithm because the core PoW algorithm - that we profile in our solution - remains the same. To this end, we use supervised machine learning (explained in Section III-D) to construct signatures and build our classifier.

On another side, an adversary may attempt to circumvent such signature-based detection in the following ways: (1) by controlling/limiting the mining; or (2) by neutralizing the signatures. Limiting the mining would reduce the hashing rate, which would indeed make the mining less profitable. Whereas, to neutralize the signatures, the adversary has to succeed in two main hurdles. First, the adversary must have to find those computation(s) that only changes those events that are unrelated to the PoW algorithm. Second, the adversary must have to run these computation(s) in parallel to the PoW algorithm, which would again hamper the hashing rate, and thus the profit.

Iii-B Data collection

To better explain our work, we first describe what data we collect and how we collect it. We used the perf [perf] tool to profile the processor’s events using HPC. In particular, we focus on hardware555Basic events, measured by Performance Monitoring Units (PMU). events (e.g., branch-misses), software666Measurable by kernel counters. events (e.g., page-faults), and hardware cache777Data- and instruction-cache hardware events. events (e.g., cache-misses) on CPU as the mining processes - depending on their design - require different type of resources. We profiled each program of both positive (mining) and negative (non-mining) class individually and collected a total 50 samples per program. Each sample consists of recordings of 28 events (described in TABLE I) for 30 seconds with a sampling rate of 10Hz, which means that each sample comprises 300 readings of 28 events, i.e., 8400 readings. To obtain clean signatures: (1) we profiled each program in its stable stage, i.e., omitting the bootstrapping phase; and (2) restarted the system to remove any trace of the previous sample.

N. Mining pool Cryptocurrency
BCD BCH BTC BTM DASH DCR ETC ETH LTC SBTC SC UBTC XMC XMR XZC ZEC
1 BTC.com
2 AntPool
3 ViaBTC
4 SlushPool
5 F2Pool
6 BTC.top
7 Bitclub.network
8 BTCC
9 BitFury
10 BW.com
TABLE II: Cryptocurrencies mined by the top-10 mining pools
Event Type Description
branch-instructions HW N. of retired branch instructions.
branch-load-misses HW N. of Branch load misses.
branch-loads HW N. of Branch load accesses.
branch-misses HW N. of mispredicted branch instructions.
bus-cycles HW N. of bus cycles, which can be different from total cycles.
cache-misses HC N. of cache misses.
cache-references HC N. of cache accesses.
context-switches SW N. of context switches.
cpu-migrations SW N. of times the process has migrated.
dTLB-load-misses HC N. of load misses at data TLB.
dTLB-loads HC N. of load hits at data TLB.
dTLB-store-misses HC N. of store misses at data TLB.
dTLB-stores HC N. of store hits at data TLB.
instructions HW N. of retired instructions.
iTLB-load-misses HC N. of instruction fetches that missed instruction TLB.
iTLB-loads HC N. of instruction fetches that queried instruction TLB.
L1-dcache-load-misses HC N. of load misses at L1 data cache.
L1-dcache-loads HC N. of loads at L1 data cache.
L1-dcache-stores HC N. of stores at L1 data cache.
LLC-load-misses HC N. of load misses at the last level cache.
LLC-loads HC N. of loads at the last level cache.
LLC-store-misses HC N. of store misses at the last level cache.
LLC-stores HC N. of stores at the last level cache.
mem-loads HC N. of memory loads.
mem-stores HC N. of memory stores.
node-load-misses HC N. of load hits at Non-Uniform Memory Access (NUMA) node.
node-loads HC N. of load misses at NUMA node.
node-store-misses HC N. of store hits at NUMA node.
node-stores HC N. of store misses at NUMA node.
page-faults SW N. of page faults.
ref-cycles HW N. of total cycles; not affected by CPU frequency scaling.
task-clock SW The clock count specific to the task that is running.
TABLE I: The events that we monitor using HPC. Here, HW = hardware, SW = software, and HC = hardware cache event.

For the positive class, we profiled a total of 11 cryptocurrencies discussed in Section III-C. As the representatives of negative class, we chose: 3D rendering; 7z archive extraction of tar.gz files; H.264 video encoding of raw video; solving mqueens problem; Nanoscale Molecular Dynamics (NAMD) simulation; Netflix

movie playback; execution of Random Forest (RF) machine learning algorithm;

Skype video calls; stress-ng [stress-ng] stress test with CPU, memory, I/O, and disk workers together; playing Team Fortress 2 game; and Visual Molecular Dynamics (VMD) modeling and visualization. It is worth mentioning that these user-tasks represent low to high resource-intensive tasks.

We used two different systems to build our dataset for the experiments. The configuration of these systems are as follows: (1) S1, a laptop with an Intel Core i7-7500U @ 2.70 GHz (1 socket x 2 cores x 2 threads = 4 logical compute resources) processor, 8 GB memory, 512 GB SSD storage, NVIDIA GeForce 940MX 2 GB dedicated graphic card, Linux kernel 4.14 and (2) S2, a laptop with an Intel Core i7-8550U @ 1.80 GHz (1 socket x 2 cores x 4 threads = 8 logical compute resources) processor, 16 GB memory, 512 GB SSD storage, Linux kernel 4.14.

All miner programs and the perf tool were launched in user-mode. Even though we did not use any system-level privileges, we believe that using root permissions for defense against cryptojacking is reasonable. It is worth emphasizing that even though the dataset has been accumulated in a controlled setup, our experiments (discussed in Section IV) well simulate real-world scenario, where samples are collected in the real-time.

Iii-C Cryptocurrencies and miners

The probability of solving the cryptographic puzzle during mining is directly proportional to the miner’s computational power/resources. Consequently, the miners pool their resources to combine their hashing power with an aim to consistently earn a portion of the block reward by solving blocks quickly. Typically, the mining pools are characterized by their hashing power. TABLE 

II shows the top-10 mining pools [top10] and the cryptocurrencies mined by them. At the time writing, these ten mining pools collectively constitutes the biggest share (84% during Q3 2018) of the cryptomining business. Please refer to TABLE B.1 (in Appendix B) for the acronyms and their corresponding cryptocurrency.

We considered all the cryptocurrencies mentioned in the TABLE II in our experiments. We used open-source miner programs to mine these cryptocurrencies. Each miner program was configured to mine with public mining pools and to utilize all available the CPUs present on the system. At the time of our experiments, the miner program for SC was not able to mine using only the CPU. Hence, we excluded SC from our experiments. To compensate SC, we included QRK whose mining algorithm - in contrast to other cryptocurrencies - uses multiple hashing algorithms. TABLE III shows the mining algorithm of different cryptocurrencies and the CPU miners that we used.

 Cryptocurrency Mining algorithm CPU miner
BCD X13 cpuminer-opt 3.8.8.1
BCH, BTC,
SBTC, UBTC
SHA-256 cpuminer-multi 1.3.4
BTM Tensority bytom-wallet-desktop 1.0.2
DASH X11 cpuminer-multi 1.3.4
DCR Blake256-r14 cpuminer-multi 1.3.4
ETC, ETH
Ethash (Modified
Dagger-Hashimoto)
geth 1.7.3
LTC scrypt cpuminer-multi 1.3.4
QRK
BLAKE + Grstl + Blue
Midnight Wish + JH +
Keccak (SHA-3) + Skein
cpuminer-multi 1.3.4
SC BLAKE2b gominer 0.6
XMC, XMR CryptoNight cpuminer-multi 1.3.4
XZC Lyra2z cpuminer-opt 3.8.8.1
ZEC Equihash Nicehash nheqminer 0.3a
TABLE III: Mining algorithm and CPU miner for different cryptocurrencies

Since our approach focuses on the underlying core PoW algorithm, we considered one currency for every mining algorithm mentioned in TABLE III. We excluded BCH, SBTC, UBTC, ETC, and XMC in our study. As the proof-of-concept implementation, we considered only CPU-based miner programs because each computer has at least one CPU, which cryptojackers can harness to mine. Nevertheless, our approach is also valid to distinguish GPU-based miners because dedicated profiling tools, such as the nvprof [nvprof] tool for NVIDIA GPUs, allow us to monitor GPU events. Apart from most of the standard events found on CPUs, GPUs have several dedicated events that can assist in creating unique signatures for GPUs.

Iii-D Classifier design

In this section, we elucidate the design of our classification methodology. Algorithm 1 describes the pipeline of our classifier.

1:  for each run from 1 to 10 do
2:     Create raw_train_set and raw_test_set by 90-10% stratified partitioning.
3:     Data preprocessing Replace NaN values from raw_train_set and raw_test_set with arithmetic mean of the considered event.
4:     Feature engineering train_set := Extract_feature(raw_train_set) test_set := Extract_feature(raw_test_set)
5:     Feature scaling scaler := StandardScaler() scaler.fit(train_set) Fit scaler on train_set scaler.transform(train_set) scaler.transform(test_set)
6:     Feature selection Compute features’ importance with forests of trees on train_set and select the most relevant features.
7:     Training Learn the model parameters for the given classifier (RF/SVM) on the training set using grid search with 5-fold stratified CV.
8:     Predict/classify the test_set.
9:  end for
Algorithm 1 Pseudo code for our supervised classification.

Our supervised classification algorithm begins with splitting the base-dataset of 1100 samples (2 classes x 11 instances x 50 samples) into 90-10% stratified train-test sets, denoted as raw_train_set and raw_test_set. Then, these subsets are processed as follows:

  1. Data preprocessing: The first step of any machine learning-based classification is to process the raw datasets to fix any missing value. Since each event we monitor returns a numerical value, we replace the missing values, if any, with the arithmetic mean of the respective event.

  2. Feature engineering: In this step, we obtain features that can be used to train a machine learning model for our prediction problem. Here, we compute 12 statistical functions (listed in TABLE IV) for every event. This step converts each sample consisting of 300 readings (rows) x 28 events (columns) to a single row of 336 (28 events

    x 12 features) data-points. The features extracted in this phase, hereinafter referred to as

    train_set and test_set, are used for the subsequent stages.

    0.2, 0.4, 0.6, and 0.8 quantile

    1, 2, and 3 sigma

    Arithmetic and geometric mean

    Kurtosis
    Skewness Variance
    TABLE IV: The statistical functions that we used for our feature engineering phase
  3. Feature scaling:

    It is an essential step to eliminate the influence of large-valued features because features with larger magnitude can dominate the objective function, and thus, deterring an estimator to learn from other features correctly. Hence, we standardize features using standard scaler, which removes the mean and scale the features to unit variance.

  4. Feature selection: In machine learning, feature selection or dimensionality reduction is the process of selecting a subset of relevant features that are used in model construction. It aims to improve estimators’ accuracy as well as to boost their performance on high-dimensional datasets. To do so, we calculate the importance of features using forests of trees [forestoftrees] and select the most relevant features.

  5. Training: The training phase consists of learning the model parameters for the given classifier on the training set, i.e., train_set. Given the nature of the problem, we resort to supervised machine learning procedures. In particular, we employed two of the most successful machine learning methods for classification, namely Random Forest (RF) [Ho:1995]

    and Support Vector Machine (SVM) 

    [Cortes:1995].

    For model selection, we use grid search with 5-fold Cross Validation (CV). The validated hyper-parameters for RF and SVM are shown in TABLE V and TABLE VI, respectively. We chose standard range of values for the hyper-parameters [HsuLibsvmTutorial2003].

    Parameter Validated values Effect on the model
    n_estimators
    {10, 25, 50,
    100, 125, 150}
    Number of trees use in the ensemble.
    max_depth [2, ) Maximum depth of the trees.
    max_features ‘auto’, ‘log2’ Number of features to consider when looking for the best split.
    split_criterion gini, entropy

    Criterion used to split a node in a decision tree.

    bootstrap true, false Bootstrap Aggregation (a.k.a. bagging) is a technique that reduces model variances (overfitting) and improves the outcome of learning on limited sample or unstable datasets.
    random_state 10 The seed used by the random number generator.
    TABLE V: Hyper-parameters validated for RF classifier
    Parameter Validated values Effect on the model
    kernel
    ‘rbf’, ‘poly’,
    ‘sigmoid’
    Specifies the kernel type to be used in the algorithm.
    C [, ] Regularization parameter that controls the trade-off between the achieving a low training error and a low testing error that is the ability to generalize your classifier to unseen data.
    ‘auto’,
     [, ]
    Shape parameter of the RBF kernel which defines how an example influence in the final classification.
    degree default=3 Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.
    random_state 10 The seed of the pseudo random number generator used when shuffling the data for probability estimates.
    TABLE VI: Hyper-parameters validated for SVM classifier
  6. Prediction: Finally, prediction is made on test_set.

The process is repeated ten times for a given experiment and the final results are computed over these ten runs.

Iv Evaluation

We throughly evaluated our approach by performing an exhaustive set of experiments. We performed the following six different experiments: (1) binary classification; (2) currency classification; (3) nested classification; (4) sample length; (5) feature relevance; and (6) unseen miner programs. TABLE VII describes the sample distribution in our base-dataset for each system, i.e., S1 and S2. Here, sub-classes of the mining task refer to the cryptocurrencies (discussed in Section III-C) while sub-classes of the non-mining task refer to the actual user-tasks that belong to the negative class (mentioned in Section III-B). We use the entire base-dataset (1100 samples per system) for each experiment, unless otherwise stated in an experiment.

Task
Sub-classes
per task
Samples per
sub-class
Total samples
per task
Mining 11 50 550
 Non-mining 11 50 550
TABLE VII: Dataset: name of the task, sub-classes per task, samples per sub-class, and total samples per task for each system

We evaluated our classifier using standard classification metrics: Accuracy, Precision, Recall, and F

 score. To increase the statistical significance of our results, we report the mean and the margin of error for the results with 95% confidence interval from ten runs of each experiment for each of the evaluation metric. See Appendix 

A for details of these evaluation metrics and the related statistical terms. We use indicates the best result for the metric and report the results as .

Iv-a Binary classification

Our main goal is to identify whether a given instance represents the mining task or not. Hence, in this experiment, the label of each sample was defined as the positive or negative class, accordingly. TABLE VIII presents the results of the binary classification using both RF and SVM.

System Method Accuracy Precision Recall F1
S1 RF
SVM
S2 RF
SVM
TABLE VIII: Results for binary classification

Both the RF and SVM yielded superior performance. However, RF performed better than SVM on both the system, i.e., S1 and S2. For the same reason, we report the results only for RF for the subsequent experiments.

Iv-B Currency classification

The aim of this experiment is to understand the difficulty level of classification among various cryptocurrencies. Therefore, the input dataset for this experiment contained instances only of the cryptocurrencies. TABLE IX lists the results of the currency classification.

System Accuracy Precision Recall F1
S1
S2
TABLE IX: Results for currency classification

Fig. 2 depicts the confusion matrices for the classification among various cryptocurrencies to provide a better perception of the results. Here, Fig. 2(a) and Fig. 2(b) correspond to S1 and S2, respectively. The confusion matrices are drawn using the aggregate results from all the ten runs. Currency classification is a multi-class classification problem, and some cryptocurrencies were misclassified among each other (see Fig. 2). Hence, the results are slightly lower than that of the binary classification.

(a) S1
(b) S2
Fig. 2: Confusion matrix for classification among various cryptocurrencies

Iv-C Nested classification

This experiment represents a simulation of a real-world scenario. Here, we first classify whether a given instance belongs to the positive class. If so, we identify the cryptocurrency it belongs to. Essentially, nested classification is equivalent to performing currency classification on the instances classified as positive in the binary classification. Fig. 3 depicts the hierarchy of nested classification.

Binary classification

Currency classification

if ‘+’ instance
Fig. 3: Hierarchy of nested classification

TABLE X shows the results of the nested classification. In the worst case, we expect the outcome of this experiment to be lower than that of the binary classification and currency classification together because a crucial aspect of such staged classification is that an error made in the prediction during the primary stage influences the subsequent stage; the results for S1 shows this phenomenon. However, in a common scenario, the expected outcome of this experiment would be between the results for the binary classification and currency classification; the results for S2 shows this effect.

System Accuracy Precision Recall F1
S1
S2
TABLE X: Results for nested classification

Iv-D Sample length

The objective of this experiment is to understand the effect of length of the samples. For deployment in the real-world scenario, any solution - apart from being accurate - must be able to detect cryptojackers rapidly. To this end, we performed the binary classification of samples of a length of 5, 10, 15, 20, 25, and 30 seconds, each in separate experiments. It is worth mentioning that we used samples of identical length for both the training and testing. Fig. 4 shows the F score when using samples of different length.

Fig. 4: F score when using samples of different length (whiskers represent margin of error)

As explained in Section III-A, the task of mining is to repeatedly execute the core PoW algorithm. Hence, even samples of shorter length can grasp the signature. As shown in Fig. 4, our system can achieve high performance with samples of 5 seconds. The dip in the curve for S1 corresponds to the thousandths digit of the F score. For the sake of brevity, we omitted the results for sample shorter than five seconds and only focus on the required minimum sample length to attain high performance with our solution.

Iv-E Feature relevance

Next, we focus on our feature selection process (mentioned in Section III-D). After calculating the importance of features, we sorted them in ascending order of their importance and selected the first-% features to do the binary classification. Fig. 5 depicts the F score when using first-% features.

Fig. 5: F score when using first-% features (whiskers represent margin of error)

Since the features are sorted in the ascending order of their importance, we begin with the feature with lowest significance. Intuitively, including important features further improves the classification process. As shown in the Fig. 5, our classifier attains high performance on both the systems using only the first-40% (less relevant) features, which verifies/approves our feature engineering and selection process.

Iv-F Unseen miner programs

There can be several different miner-programs available to mine a given cryptocurrency. These programs come from different developers/sources. Consequently, there can be some variations in the behavior of the miner-program itself, e.g., in the code section before/after the PoW function or handling (on the programming-side) a correct nonce found while mining. The reason is that they are developed by different developers, which intuitively will cause variations. Training the model for each program may not be feasible for a variety of reasons. Hence, to investigate the effectiveness of our approach in such a situation, we set up another experiment. Here, we selected the binary classification as the target where the samples from all the mining and non-mining tasks were labeled as the positive or negative class, respectively. However, we chose two additional miner programs for BTC, namely, BFGMiner 5.5 and cgminer 4.10. We collected additional 50 samples each for BFGMiner 5.5 and cgminer 4.10 on both S1 and S2 separately. In the training phase, we used samples from one of the three miner programs for BTC. On the contrary, we used samples from one of the other two miner programs for BTC during the testing phase. TABLE XI presents the results of classifying samples from the miner programs that were unseen in the training phase.

System Task Accuracy Precision Recall F1
S1
S2
TABLE XI: Results for unseen miner programs

The notation means that the training was done with the samples from while the testing was done with the sample from for BTC. Here, cpuminer-multi 1.3.4, BFGMiner 5.5, cgminer 4.10. It is important to mention that these results are for the classification of all the mining and non-mining tasks with BTC being trained and tested upon samples from different programs.

As discussed in Section III-A, the miners have to execute the same core PoW algorithm for a given cryptocurrency. Hence, samples from different miner programs for the same cryptocurrency retain the same signatures, which is reflected in our results.

V Limitations

In this section, we address the potential limitation of our proposed approach.

V-a Zero-day cryptocurrencies

A zero-day cryptocurrency would be a currency that uses a completely new or custom PoW algorithm that was never seen before. As a matter of fact, for a cryptocurrency to obtain market value: (1) its core-network should be supported by miners/pools; and (2) its PoW algorithm must be accepted by the crypto-community and tested mathematically for its robustness. Therefore, the PoW algorithm for a new cryptocurrency would become public by the time it gets ready for mining, which would give us sufficient time to capture this new cryptocurrency’s signature and to train our model.

Importantly, miners prefer to mine cryptocurrencies that are more profitable and avoid hashing the less rewarding ones. As it happens to be, more profitable cryptocurrencies are indeed popular and their PoW algorithms are certainly known to the public. In our experiments, we considered all the popular cryptocurrencies, and our results (presented in Section IV) demonstrate the high quality of our proposed approach along various dimensions.

V-B Scalability

The key concept of our approach is to profile the behavior of a processor’s events for mining algorithms. Since there are only a finite number of CPUs/GPUs, procuring their signature is only a matter of data collection. However, it might appear as a ponderous job and may be seen as a limitation of our work. But, once it is accomplished for the available CPUs/GPUs, maintaining it is relatively simpler as merely a limited number of CPUs/GPUs are released over a period of time.

V-C Process selection

As mentioned in Section III-B, our system requires per program/process-based recording of HPC for different events as the input to the classifier. In practice, several processes run in the system. Hence, monitoring each process may consume time and can be seen as a limitation of our work. However, as shown in Fig. 4, our system can achieve high performance even with samples of 5 seconds. On another side, the miner programs attempt to use all the available resources. Therefore, an initial sorting/filtering of processes based on their resource usage can help to boost the detection process in real-time.

V-D Restricted mining

A mining strategy to evade detection from our proposed methodology can be restricted mining that aims to change the footprint of the mining process. Here, the miner program/process can be modified to perform arbitrary operations during mining. But, such maneuvers would directly affect the hashing rate and consequently the profits; making the task of mining less appealing. Nevertheless, like any signature-based detection technique, it may be seen as a limitation of our work.

Vi Conclusion

Cybercriminals have developed several proficient ways to exploit cryptocurrencies with an aim to commit many unconventional financial frauds. Covert cryptomining is one of the most recent means to monetize the computational power of the victims. In this paper, we present our efficient methodology to identify covert cryptomining. Our solution has a broader scope as it targets the core PoW algorithms and uses the low-performance overhead HPC that are present in modern processors to create discernible signatures. We tested our generic approach against a set of rigorous experiments that include eleven distinct cryptocurrencies. We found that our classifier attains high performance even with short samples of five seconds. In the future, we will investigate the impact of samples from different operating system and virtualized environments. We also hope to release a desktop application for run-time identification of covert cryptomining

Acknowledgment

Ankit Gangwal is pursuing his Ph.D. with a fellowship for international students funded by Fondazione Cassa di Risparmio di Padova e Rovigo (CARIPARO).

References

Appendix A Standard definitions

Here, we present the definitions of some concepts that we used in our work.

[]

Standard scaler

transforms each feature in such a way that the mean becomes zero and standard deviation becomes one. Specifically, given a feature

and one of its value , the following formula is applied:

where and are the mean and standard deviation of the variable .

Standard error

of a variable is expressed as:

where and are the number of observations and standard deviation of the variable .

Margin of error

is the range of values above and below the sample mean for a given confidence interval. It is calculated as:

where is the coefficient for the selected confidence level. E.g.,  is 1.96 for 95% confidence interval.

Accuracy

measures how often the classifier makes the right prediction defined as the ratio between the number of hit and the number of predictions.

Precision

quantifies the ability of a classifier to not label a negative example as positive. It is computed as the ratio of the number of true positives and the total number of instances labeled as positives.

Recall

defines the probability that a positive prediction made by the classifier is actually positive. It is computed as the ratio of the number of true positives and the total number of positives in the set.

F score

is a single metric that combines both precision and recall via their harmonic mean:

Appendix B Acronyms

TABLE B.1 describes the acronyms used for different cryptocurrencies.

Acronym Cryptocurrency
BCD Bitcoin Diamond
BCH Bitcoin Cash
BTC Bitcoin
BTM Bytom
DASH Dash
DCR Decred
ETC Ethereum Classic
ETH Ethereum
LTC Litecoin
QRK Quark
SBTC SuperBitcoin
SC Siacoin
UBTC UnitedBitcoin
XMC Monero-Classic
XMR Monero
XZC Zcoin
ZEC Zcash
TABLE B.1: Cryptocurrencies and their acronym

Mauro Conti is Full Professor at the University of Padua, Italy, and Affiliate Professor at the University of Washington, Seattle, USA. He obtained his Ph.D. from Sapienza University of Rome, Italy, in 2009. After his Ph.D., he was a Post-Doc Researcher at Vrije Universiteit Amsterdam, The Netherlands. In 2011 he joined as Assistant Professor the University of Padua, where he became Associate Professor in 2015, and Full Professor in 2018. He has been Visiting Researcher at GMU (2008, 2016), UCLA (2010), UCI (2012, 2013, 2014, 2017), TU Darmstadt (2013), UF (2015), and FIU (2015, 2016, 2018). He has been awarded with a Marie Curie Fellowship (2012) by the European Commission, and with a Fellowship by the German DAAD (2013). His research is also funded by companies, including Cisco, Intel, and Huawei. His main research interest is in the area of security and privacy. In this area, he published more than 250 papers in topmost international peer-reviewed journals and conference. He is Area Editor-in-Chief for IEEE Communications Surveys & Tutorials, and Associate Editor for several journals, including IEEE Communications Surveys & Tutorials, IEEE Transactions on Information Forensics and Security, IEEE Transactions on Dependable and Secure Computing, and IEEE Transactions on Network and Service Management. He was Program Chair for TRUST 2015, ICISS 2016, WiSec 2017, and General Chair for SecureComm 2012 and ACM SACMAT 2013. He is Senior Member of the IEEE.

Ankit Gangwal received his BTech degree in Information Technology from RTU Kota, India in 2011 and his MTech degree in Computer Engineering from MNIT Jaipur, India in 2016. Currently, he is a Ph.D. student in the Department of Mathematics, University of Padua, Italy with a fellowship for international students funded by Fondazione Cassa di Risparmio di Padova e Rovigo (CARIPARO). His current research interest is in the area of security and privacy of the blockchain technology and novel network architectures.

Gianluca Lain is pursuing B.Sc. from the Department of Mathematics, University of Padua, Italy. His current research interests include cybersecurity and advanced machine learning techniques. In his spare time, he likes to tamper with hardware components of modern computers and play the piano.

Samuele Giuliano Piazzetta is pursuing B.Sc. from the Department of Mathematics, University of Padua, Italy. His current research interests include cryptocurrencies and advanced machine learning techniques. In his spare time, he likes drawing and studying digital design.