The explosive progress of deep learning techniques makes an increasing number of app developers pay attention to this technology. Deep learning apps reach a certain scale in the market and keep growing rapidly [xu2019first]
. To lower the threshold for developers to utilize and deploy deep learning models on their apps, companies (i.e., Google) develop deep learning frameworks, including Tensorflow Lite[tensorflowlite]
, Pytorch Mobile[pytorchmobile], Caffe2 Mobile [caff2mobile], MindSpore Lite [mindsporelite], Core ML [coreml] and so on.
To build a deep learning app, mainstream platforms offer two common deploying strategies, on-device deployment and on-cloud deployment [firebaseOndeviceVSOncloud]. On-cloud deployment allows developers to deploy their models on the remote server, and collect the runtime inputs from app users. Then, the inputs are processed by the remote server. Finally, the results are returned to app users. However, the efficiency of this method is often limited by the network quality and power usage [mcintosh2019can]. Even worse, on-cloud deployment has potential risks of user privacy leakage [kumar2020adversary]. Therefore, lots of developers turn to the on-device deployment strategy.
Motivation. The aforementioned mainstream mobile deep learning frameworks and deployment platforms rarely provide developers with solutions or APIs to protect their deep learning models. Unprotected deep learning models are widely used in apps in various fields such as finance, health, and self-driving [Sun2020MindYW]. This may lead to potential user data leakage, commercial information theft, and the malfunction of important functional modules of the app (e.g. a polluted face recoginition model fails to recoginize an authorized user).
State-of-art. Although some studies [li2021deeppayload] focus on attacks on mobile deep learning models, their approaches are often white-box, which means that the attacker can know or even modify the structure (the number, type, and arrangement of layers) and weights (parameters of each layer) of the model. These white-box attacks rely on the source of victim models, which can be extracted by decompressing the victim app. However, a recent study shows that developers are more likely to use encryption or hash code to protect their models [Sun2020MindYW]. It means that the knowledge of a model (i.e., structure and weights) inside the app is always unknown, which nullifies these white-box attacks. Work [huang2021robustness] claims to be a black-box attack method towards deep learning models. However, they still require knowledge of the structure or weight of the model. Such information is commonly unknown in the real world, especially for commercial apps.
Our solution. In this paper, we developed a practical pipeline that covers data preparation, student model (i.e., a substitute model to mimic the behavior of the victim model) learning, and adversarial examples generation. Our black-box attack approach can bypass many existing model protection measures. First, we instrument a DummyActivity to the victim app to invoke the deep learning model (a.k.a teacher model) adopted by the victim app. The raw data fetched from the Internet or public dataset are fed into the teacher model through the DummyActivity to generate the corresponding labels (See Sec. III-A). Next, a substitute model (a.k.a. student model), which is selected to mimic the behavior of the teacher model, is then trained with the labeled data (See Sec. III-B). Last, adversarial attack algorithms are applied to the student model, and the generated adversarial examples are leveraged to attack the teacher model in the victim app (See Sec. III-C).
Contributions. In summary, our main contributions are described as follows:
We propose a black-box attack pipeline, from the model collection to adversarial example generation, to attack deep learning apps.
We conduct a series of experiments to evaluate how different factors affect the attack success rate, including the structure of the student models, the scale of the teaching dataset, and the adversarial attack algorithms; and
We propose a series of strategies to defend against the proposed attacks.
We release the source code and data on the online artefact [onlineartefacts].
Ii-a On-device Deep Learning Model
Developers are increasingly using deep learning models to implement face recognition [yin1029feature], image classification [daniel2010towards], object detection [daniel2010towards], natural language understanding [chen2020unblind], speech recognition [dong2020rtmobile], and recommendation systems [wang2020next] in their apps.
To meet the needs of app developers, vendors build deep learning frameworks for mobile platforms, such as TensorFlow Lite [tensorflowlite] and PyTorch Mobile [pytorchmobile]. They also provide platforms (e.g. TensorFlow Hub [tensorflowhub], MindSpore Model [mindspore]) to share pre-trained deep learning models for mobile apps.
There are two common deploying strategies, on-device deployment and on-cloud deployment. Developers nowadays are inclined to use the on-device method to deploy deep learning models. On the one hand, the hardware of modern mobile devices is qualified to complete computationally intensive deep learning tasks. As the CPU and GPU of modern mobile devices have stronger computing power, on-device inference can be completed promptly. The neural processing units (NPU) in mobile devices further expand the feasibility of on-device deployment of deep learning models [tan2020fastva].
On the other hand, deep learning frameworks provide simple and easy-to-use APIs to invoke the model for inference tasks. For example, as shown in List. 1, Interpreter defined in the TensorFlow Lite is used to load the model and run inference. file_of_a_tflite_model represents the file path of the deep learning model. input presents the user’s input and output
represents the result of the inference which is a vector. Based on the output, the app can give a response to the user. For example, the user feeds an image of a rose asinput into the Interpreter. After inference, the output is assigned to a one-hot vector like [0, 0, 1, 0, …, 0]. According to pre-determined rules from developers, the vector is then interpreted into a human-readable format such as a string ”rose”.
In the apps, developers can define the locations of their deep learning models. A common practice is to store the models in the assets folder (/assets/) of the apk (Android Package, which is the binary format of an app). Data including user input, output vector, and training dataset are invisible to users. Although models in some deep learning apps can be directly extracted by decompiling the apk, lots of them still cannot be fetched even with the state-of-art tools [Sun2020MindYW]. Therefore, the information about the structures and weights of these deep learning models is unknown to attackers.
Ii-B Black-box Attack
Since the information about the victim is always unknown, attacks must be performed in a black-box manner. A black-box attack is not a specific attack technique but describes an attack scenario. In this scenario, the attackers have no internal knowledge of the victim, and cannot modify the internal structure and other attributes of the victim. In our context, the attackers have no internal knowledge of the deep learning model in the victim app and cannot modify the internal attributes of the model. The attackers can only query the model and get its prediction.
Ii-C Adversarial Attack
Deep learning models are vulnerable to adversarial attacks. The adversarial attack utilizes adversarial examples to spoof the deep learning models. An adversarial example is generated by adding tiny perturbations to the input data. It can be represented as , where is the adversarial example, is the input data, is the perturbations and is the multiplier to control the degree of the perturbations [goodfellow2015explaining, huang2021robustness]. An adversarial attack can be briefly summarized in the following example. With a deep learning model for image classification and an input image , is the label of (). Attackers can generate an adversarial example by adding infinitesimal perturbations that a human being cannot distinguish from the original image . But the adversarial example can spoof the victim deep learning model to make an incorrect prediction ().
Before introducing the details of our approach, we first define two key concepts used in our work: teacher model and student model. A teacher model is the deep learning model that is used in the victim app. A student model can be considered as a substitute or imitator of the teacher model. A qualified student model can imitate the teacher’s behaviors well. With the same input, the student and the teacher model are supposed to output the same prediction. A well-behaved student model can guarantee that the adversarial examples generated can be directly transferred to the teacher model.
Our black-box attack pipeline, as illustrated in Fig. 2, consists of three procedures: teaching dataset preparation, student model learning, and adversarial examples generation.
Input: The input of our pipeline is a deep learning app and an unlabeled dataset. The dataset is either crawled from the web or a public dataset that is related to the app’s task. For example, the dataset for an app whose task is flower classification can be images of different kinds of flowers.
Teaching dataset preparation: Given a deep learning app, we first locate and learn how the deep learning model is invoked by tracking the official APIs from deep learning frameworks in the app. Then, we instrument a DummyActivity, in which the teacher model is invoked, to the victim app with Soot [Soot, flowdroid]. Next, we feed the unlabeled dataset into the DummyActivity to generate the corresponding labels (see Sec. III-A).
Student model learning: To imitate the teacher model, we select mainstream deep learning models with the same task as the student model candidates. With the labeled teaching dataset generated, a student model can be trained to imitate the teacher model (see Sec. III-B).
Adversarial examples generation: With the trained student model, adversarial examples can be generated with adversarial attack algorithms (see Sec. III-C).
Output: The output of the whole pipeline is a set of adversarial examples, which are used to attack the victim app.
Iii-a Teaching Dataset Preparation
For a deep learning app, we identify some basic attributes of the teacher model in the app. These attributes mainly consist of the type of its task, the name of classes it predicts, and the formatting requirement of the input.
Type of task and Name of classes : Determining the task of the deep learning model is a prerequisite for collecting a suitable dataset. The name of the output (i.e., classes) of the deep learning model can further help narrow the selection range of the dataset. This information can be obtained by trying the app.
Formatting requirement of the input : The input format of the deep learning model is determined by the model developer. These formatting requirements include size (e.g., 128*128), data type (e.g., int8, float32), and so forth. By reverse engineering the app, these requirements can be obtained manually.
For a plant identifier app, its task is image classification. Its names of classes can be the names of plants (e.g, lily, plane tree). Its formatting requirement of the input can be a 2552553 RGB image.
Based on this information, we search a public dataset from the web. However, it is common that no public related dataset is available for real-world apps. There are two options to collect the dataset for such apps. The first is to use a pre-trained student model without fine-tuning by the teaching dataset, which is simple and fast. The second is to crawl the related data from the Internet. We compare these two options with experiments in Sec. IV-C.
Since the input format of the deep learning model is determined by the model developer, the collected dataset needs to be preprocessed to make it meet the specific formatting requirements. Based on the previously obtained information about the input formatting requirements, the input data can be manipulated (e.g., zoom in size, update figure accuracy) to meet the input formatting requirements.
The dataset we crawled is unlabeled. However, to better imitate the teacher model, we need to train the student model with a labeled dataset. Therefore, we need to extract the labels of the dataset with the teacher model. For a black-box attack, it can be unrealistic to directly retrieve the teacher model from the app. Therefore, we perform the following steps to transfer an unlabeled dataset into a labeled one :
STEP 1: We collect the API patterns of mainstream mobile deep learning model deployment frameworks. The corresponding APIs can be found in the official docs (i.e., Tensorflow Lite inference [tensorflowliteinference], Pytorch Mobile inference [pytorchinference]).
STEP 2: We use Soot to locate the signatures of these APIs in the app.
STEP 3: We instrument DummyActivity into the app with Soot and load the deep learning model inside the onCreate() method of DummyActivity with the same API found in STEP 2.
STEP 4: We pass the dataset to the inference API and output the corresponding labels with logs.
STEP 5: We leverage the Android Debug Bridge (ADB) to launch our DummyActivity (i.e., am start -n DummyActivity) to trigger the teacher model. As a result, we can obtain the corresponding labels for the inputs.
Example. The Mushroom Identifier app (com.gabotechindustries.mushroomIdentifier) detects and identifies mushrooms in the picture and predicts their categories. This app has a class called ”Xception”. List. 2 summarizes how this app invokes the deep learning model in ”Xception” with the APIs provided by TensorFlow Lite.
After instrumenting the DummyActivity to the app, we launch the DummyActivity with an ADB command (i.e., am start -n DummyActivity). As a result, the corresponding labels of can be obtained. The labeled dataset is then leveraged to train the student model.
Iii-B Student Model Learning
Selection criteria for student model. The adversarial examples generated based on the student model are used to attack the teacher model. The selection criteria of the student model is that it should be able to complete the same task as the teacher model. Different student models can result in different attack success rates. We discuss this correlation in Sec. IV-B.
Student model learning.
In our approach, the student model can be trained in a supervised way with the teaching dataset generated in the previous step. In supervised learning, the model is trained on a labeled dataset, where the labels are used to evaluate the model’s prediction accuracy. To improve the prediction accuracy, the model needs to adjust the weights in the process of training. The adjusting strategy relies on the error between the model’s predictions and the ground truths(i.e., labels). The loss function is used to calculate this error. For example, the CrossEntropyLoss is widely used in image classification[DBLP:conf/gcce/TakedaYM20].
Example. Recall the app in Sec. III-A, its task is to identify mushrooms in an input image and predict their categories. Thus, the selected student model must be a model related to image classification. The image classification model VGGll [simonyan2014very] can be used as the student model to be trained with the labeled teaching dataset generated in Sec. III-A. Note that a general image classification model can be trained to identify mushrooms with a suitable dataset.
Iii-C Adversarial Examples Generation
With the trained student model in the previous step, the next step is to use the student model to generate adversarial examples that can spoof the teacher model. The adversarial examples are proved to have the property of transferability [liu2017delving]
which means that the adversarial examples generated for one model through an attack algorithm can also attack another model. Motivated by this, adversarial attack algorithms are utilized to generate adversarial examples based on the student model, and these generated adversarial examples can be directly applied to attack the teacher model in the victim app. Note that different domains (e.g., NLP, computer vision) require different adversarial attack algorithms. For example, in computer vision, the following algorithms can be representative: FGSM[goodfellow2015explaining], RFGSM [tramer2020ensemble], FFGSM [kim2020torchattacks], MIFGSM [dong2018boosting], PGD [madry2017towards], TPGD [zhang2019theoretically], and BIM [kurakin2016adversarial]. Different attack algorithms can result in different attack performances. We discuss this correlation in Sec. IV-D.
Iii-D Robustness Evaluation
Given a teacher model, the teaching dataset is divided into training and testing sets. The training set is used to train the student model and the testing set is used to generate adversarial examples. Compared with current methods [huang2021robustness] that use a very limited number of attack examples, our evaluation set is much larger and thus our results are more convincing.
Suppose there are total inputs in the testing set for a deep learning app. (out of
) inputs can be correctly classified by the teacher model. Theseinputs are used to generate adversarial examples since it is inappropriate to generate an adversarial example when the input cannot be correctly recognized by the teacher model. Then we generate adversarial examples with the student model through an existing attack algorithm (e.g., FGSM [goodfellow2015explaining]). These adversarial examples are then fed to the teacher model to obtain the prediction results. An attack is considered to be successful if the adversarial example is misclassified by the teacher model. If out of adversarial examples are misclassified by the teacher model, then the attack success rate is:
The quality of the student model is key to reach a high attack success rate. A qualified student model can imitate the behavior of the teacher model well and generate adversarial examples with higher transferability. These adversarial examples are more likely to spoof the teacher model [liu2017delving]. Therefore, we conduct experiments to investigate the influence factors on the attack success rate of the student model, including the scale of the teaching dataset and the model structure of the student model. To further evaluate the effectiveness of our method, we conduct one control experiment that uses a random pre-trained model to generate adversarial examples termed as ‘blind attack’ (see Sec. IV-E).
Iv-a Experiment Settings
In the following experiments, We reuse 10 deep learning apps in the work of Huang et al. [huang2021robustness]
so that we can compare our approach with theirs. Considering the workload of training student models and applying different attack methods to generate adversarial examples, it is not trivial to experiment on 10 apps. For example, one teaching dataset is ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012)[imagenet] whose size is above 138GB and the number of images is about 14 million. This takes more than a week for a computer with 8 NVIDIA Tesla K40s.
|1||Fresh Fruits||Identify if the fruit is rotten|
|2||Image Checker||Image classifier based on ImageNet|
|3||Tencent Map||Identify road condition|
|4||Baidu Image||Identify flowers|
|5||My Pokemon||Identify pokemon|
|6||Palm Doctor||Identify skin cancer|
|7||QQ browser||Identify plants|
|10||Bei Ke||Identify scenes|
The selected apps are listed in Table.I. For convenience, in the following experiments, every app is represented by its ID instead of name. Deep learning functionalities of each app are also listed, which are useful for finding suitable dataset.
Research questions (RQ) 1 to 3 discuss the relationship between attack performance and student structure (RQ1), teaching dataset size (RQ2), and hyper-parameter of attack algorithm (RQ3). We compare the attack success rates between our approach with [huang2021robustness] in RQ4. All these experiments111In the process of training student models, we set optimizer to SGD [torchsgd], loss function to CrossEntropyLoss [torchcross] , learning rate to 0.001, and epoch to 30.
, learning rate to 0.001, and epoch to 30.are conducted in a Linux box with Nvidia(R) GeForce RTX(TM) 3080 GPU and 128G RAM.
Iv-B RQ1: How the structure of the student model influences the attack rate.
Motivation. The stronger the student model’s ability to imitate the teacher model is, the higher the final attack success rate is. Since the internal information of the teacher model is unknown, choosing the most similar student model by comparing weights and structure can be impossible. Therefore, in this RQ, we intend to explore how the structure of the student model influences the attack rate and analyze the underlying reasons. The attack rate is defined in Sec. III-D.
Approach. This RQ explores how the structure of the student model influences the attack rate, so only one teacher model and only one attack algorithm are needed. In this RQ, we randomly select one app (i.e., No. 1). The deep learning model inside the app is considered as the teacher model, MIFGSM as the attack algorithm. The teaching dataset is a dataset from Kaggle [kaggledatasetfruit] with 13.6K images. The teaching dataset is shuffled and then divided into training and testing sets at 4:1. We carefully select 16 student models that are qualified for this image classification task. These student models are grouped into four categories with the decrease of their model sizes and complexity:
VGGs [simonyan2014very]: VGG11, VGG13, VGG16, VGG19;
Resnets [kai2016deep]: Resnet18, Resnet34, Resnet50, Resnet101, Resnet152;
Densenets [huang2017densely]: Densenet161, Densenet169, Densenet201; and
Small nets: MobilenetV2 [mark2018mobilenet], ShufflenetV2 [ma2018shufflenet], Squeezenet [iandola2016squeezenet], Googlenet [szegedy2015going].
Among the above four categories, VGGs have the highest complexity in terms of the number of weights, which is 3-4 times that of Resnets and Densenets, and 20-30 times that of Small nets. Resnets and Densenets have similar sizes and complexity.
Result. Fig. 3 shows the comparison of attack performance among four student model categories. The abscissa represents attack success rates in percentage, and the ordinate gives four categories. In every single box, the triangle represents the mean of attack success rate and the line represents the median. The leftmost and rightmost edges represent minimum and maximum attack success rates in the corresponding categories.
The attack success rates are divided into three segments. The first watershed is between Small nets and Resnets, the average attack success rate of Small nets is 29.4%, the minimum is 23.97% (, where and are defined in Sec. III-D), and the maximum is 32.86% (). The second watershed is between Densenets and VGGS. Densenets and Resnets who have similar model complexities also have similar attack success rates. Densenets’ average attack success rate is 37.82%, the minimum is 36.88% (), and the maximum is 38.92% (). Resnets’ average attack success rate is 35.28%, the minimum is 33.68% (), and the maximum is 37.79% (). The most successful student models are VGGs. Their average attack success rate is 51.44%, the minimum is 47.42% (), and the maximum is 56.57% ().
Throughout the experiment, we found that a more sophisticated student model has a better opportunity and flexibility to adjust its weights to better imitate the teacher model. The better the student model imitates, the stronger the transferability of the adversarial examples generated by the student model is.
Num. of apps. Due to page restriction, we only present the result of one app. We also perform the same experiments on other apps and observe the same trend on them.
[title=Answer to RQ1,boxrule=1pt,boxsep=1pt,left=2pt,right=2pt,top=2pt,bottom=2pt] To better imitate the teacher model and reach a higher attack success rate, a student model with a more complex structure is a better choice. Among 16 models in 4 categories, VGGs have a relatively high average attack success rate compared with other student model candidates.
Iv-C RQ2: How the scale of the teaching dataset can influence the attack rate.
Motivation. Training a student model on a large teaching dataset consumes lots of time and computing resources. The size of the teaching dataset can influence the prediction accuracy of the student model. The prediction accuracy reflects how well the student model imitates the teacher model. Thus, this RQ explores how the scale of the teaching dataset can influence the attack rate.
Approach. As this RQ explores how the scale of the teaching dataset influences the attack success rate, so only one teacher model, one attack algorithm, and one structure of the student model are needed. In this RQ, we randomly select one app (i.e., No.4) as the teacher model, MIFGSM as the attack algorithm, and VGG11 trained on a teaching dataset from TensorFlow [tfdataset] with 3.6K images as the student model.
The teaching dataset is shuffled and then divided into training and testing sets at 4:1. The attack success rate and student model accuracy of each student model is evaluated on the same testing set. training sets are obtained from the left part of the teaching dataset. Note that without training, the method degenerated to ‘blind attack’.
Result. Fig. 4(a) shows the relationship between the size of the teaching dataset (i.e., abscissa in Fig. 4(a)) and attack success rate (i.e., ordinate in Fig. 4(a)). For example, point (10, 28.28) represents that the attack success rate of the student model trained on the teaching dataset with 367 images (10% of original size) is 28.28%.
Compared with the models trained on our teaching dataset (black points), the performance of blind attack (blue points) is 2-4 times worse. It proves that a teaching dataset generated by our pipeline is indispensable. To better reflect the relationship, trend line is based on a logarithmic function , where , and control how well the trend line fits the scattered points. The trend line shows that the attack success rate increases rapidly and gradually stabilizes with the teaching dataset increases. With the increase of the scale of the teaching dataset, the student model can further adjust its weights, resulting in better imitation of the teacher model and higher transferability of generated adversarial examples.
The reason for the gradual stabilization of the trend line needs to be explained with the help of Fig. 4(b). The abscissa in Fig. 4(b) represents the teaching dataset size, which is the percentage of the original dataset, and the ordinate represents the accuracy of the student model in percentage. For example, point (20, 86.02) represents that the prediction accuracy of the student model trained on the teaching dataset with 687 images (20% of original size) is 86.02%. Compared with other cases, a model without training (i.e., (0, 19.59) in Fig. 4(b)) has a poor performance. (20, 86.02) is a key point, after which the accuracy of the student model becomes stable.
Higher accuracy means a higher model similarity between the student model and the teacher model. The key point (20, 86.02) in Fig. 4(b) represents that the imitation ability of the student model is close to its upper bound. Therefore, further increasing the teaching dataset brings negligible improvement in terms of the attack success rate. This is why the growth of attack success rate in Fig. 4(a) becomes stable after this key point.
Num. of apps. Same as RQ1, we perform the same experiments on other apps and observe the same trend on other apps.
[title=Answer to RQ2,boxrule=1pt,boxsep=1pt,left=2pt,right=2pt,top=2pt,bottom=2pt] A teaching dataset generated by our pipeline is necessary for a high attack success rate. The accuracy of the student model increases as the size of the teaching dataset grows. The accuracy of the student model reflects how well it imitates the teacher model. When the accuracy of the student model reaches around 85%-90%, its imitation ability is close to the upper bound. Therefore, blindly increasing the teaching dataset contribute less to the attack performance.
Iv-D RQ3: How the hyper-parameter of attack algorithms influences the attack performance.
Motivation. The attack performance is evaluated on both the attack success rate and the degree of the added perturbation to the original input. A key property of the adversarial example is that it is generated by adding tiny perturbations to the original input. If perturbations are too large, they can be noticed by human beings. If the perturbations are too small, the attack success rate reduces. The following experiment explores how the hyper-parameters of attack algorithms can influence attack performance.
Approach. This RQ explores how the hyper-parameters of attack algorithms influences the attack performance, so only one teacher model and one student model are needed. In this RQ, we randomly select one app (i.e., No.1) as the teacher model, VGG11 trained on a teaching dataset from Kaggle [kaggledatasetfruit] with 13.6K images as the student model.
The important hyper-parameter eps controls the degree of image perturbation in all seven attack algorithms (FGSM, BIM, RFGSM, PGD, FFGSM, TPGD, MIFGSM). eps varies from to with step . The initial value is a default value used by the author of these 7 attack algorithms [dong2018boosting, goodfellow2015explaining, tramer2020ensemble, kim2020torchattacks, madry2017towards, zhang2019theoretically, kurakin2016adversarial]. We set the end value to . Although larger can bring a higher attack success rate, the perturbations of the image also become larger. A higher degree of perturbations brings a more significant difference between the original input image and the adversarial example. As a result, people can distinguish the difference between the original input and the adversarial example. To ensure the degree of the perturbations is minuscule, the maximum of eps is set to . This reduces the upper bound of the success rate but can ensure perturbations to be unperceivable.
Result. As shown in Fig. 5, the attack success rate of FGSM ranges from 23.22% () to 48.74% (), RFGSM ranges from 1.96% () to 48.85% (), FFGSM ranges from 22.28% () to 51.79% (), TPGD ranges from 21.11% () to 40.72% (), MIFGSM ranges from 34.25% () to 71.47% (), BIM ranges from 34.36% () to 62.93% (), and PGD () ranges from 34.40% () to 62.97%. All algorithms reach its highest attack succes rate with .
Fig. 5 also indicates that different attack algorithms have different sensitivity to the adjustment of eps. However, is an important watershed for all seven attack algorithms. When , the attack success rate decrease rapidly. This is because that the perturbations are too minuscule to spoof the teacher model.
Another important watershed is . As shown in Fig. 6, when ( and colomun in Fig. 6) the adversarial examples have relatively large perturbations. Such perturbation can be detected by human beings, which causes the adversarial example unqualified.
Since attack performance is determined by the success rate and the degree of perturbation, we evaluate the performance on both sides.
Attack success rate. MIFGSM shows the highest attack success rate through all 7 attack algorithms, reaching 71.47% when . Also, it can maintain a relatively rapid growth rate after the first watershed . The reason for its strong attack capability is that the adversarial examples generated by it are more transferable. Compared with other FGSM based attack algorithms, the highest attack success rates of FFGSM, RFGSM, and FGSM are 51.79%, 48.85%, and 48.74%, respectively. The reason why MIFGSM outperforms other algorithms is that it invites the momentum term into the iterative process [dong2018boosting].
Degree of perturbation. Although the attack success rate of MIFGSM keeps increasing quickly after both watersheds, the perturbations in adversarial examples become detectable by the human visual system when . This behavior is common among all 4 FGSM based attack algorithms, so adjusting eps to or higher is not recommended when using this type of algorithm. For BIM and PGD, the highest attack success rates of them are the same (62.93%). Although the attack success rate is less than MIFGSM’s, the degree of perturbation in their generated adversarial example is much smaller than MIFGSM. Therefore BIM and PGD with are recommended.
Num. of apps. Same as RQ1, we perform the same experiments on other apps and observe the same trend on other apps.
[title=Answer to RQ3,boxrule=1pt,boxsep=1pt,left=2pt,right=2pt,top=2pt,bottom=2pt] An ideal eps interval is [8/255, 16/255]. Keeping eps in this interval can guarantee a higher attack success rate with limited added perturbation.
Iv-E RQ4: Comparison among our approach, ModelAttacker and blind attack
Motivation. We compare the performance among our approach, ModelAttacker proposed by Huang et al. [huang2021robustness] and blind attack on 10 apps in Table. I. The experiment settings of ModelAttacker and blind attack are as same as our approach.
Approach. The attack success rates of all apps in Table. I are tested with the best student model structure, the most cost-effective teaching dataset size, and attack algorithms with a suitable value of eps (defined in Sec. IV-D).
Teaching dataset. The teaching datasets for most apps are fetched from public datasets, including Kaggle-Fruit [kaggledatasetfruit] (No. 1), ILSVRC2012 [imagenet] (No. 2, 8, 10), Kaggle-Road [kaggledatasetroad] (No. 3), Tendorflow-Flowers [tfdataset] (No. 4), Kaggle-Pokemon [kaggledatasetpokemon] (No. 5), and HAM10000 [hamdataset] (No. 6). For the other 2 apps (No. 7, 9), the teaching datasets are crawled from Google Images [googleimages]. These datasets are available in our online artefact [onlineartefacts]. Each teaching dataset is shuffled and then divided into training and testing sets at 4:1.
Student model. Based on the result of RQ1, we select VGG11 as the student model for all 10 apps.
Attack algorithm. Based on the result of RQ3, we select MIFGSM as the attack algorithm and set to .
Result. As shown in Fig. 7, the height of the first bar for each app represents our attack success rate, the second is for ModelAttacker, and the third is for the blind attack. Apps with No.4,5,6,7,10 can be attacked by the blind attack so the other 5 apps do not have the bars to show the results of blind attack.
The range of our attack success rate is from 34.98% to 91.10% and on average is 66.60%. Our approach does not need the weights and structures of deep learning models inside the victim apps. Although with these constraints, our approach still gets a higher attack success rate for 8 of 10 apps compared with ModelAttacker. Specifically, the attack success rate of our approach is 8.27% to 50.03% higher than ModelAttacker, and on average is 36.79% higher than ModelAttacker.
For the left 2 apps, ModelAttacker can reach a higher attack success rate. This is because that they can compare the similarity between the victim model (i.e., teacher model) and their substitute model (i.e., student model) with the knowledge of weights and structure of the teacher model [huang2021robustness]. However, our approach is a black box attack so the weights and structure of the teacher model are unknown.
Compared with blind attack which is the common basic black-box attack method, our approach can attack all 10 apps successfully while the blind attack can only attack 5 of them. Meanwhile, the attack success rate of our approach is at least 2.6 times higher than the blind attack.
[title=Answer to RQ4,boxrule=1pt,boxsep=1pt,left=2pt,right=2pt,top=2pt,bottom=2pt] Our approach can indeed effectively attack 10 selected deep learning apps, thus providing a new perspective and method for evaluating app reliability. Compared with existing methods, we can reach a relatively high attack success rate of 66.60% on average, outperforming others by 27.63%.
V-a Threats to Validity
To minimize the bias of the results, all three experiments in this paper vary a single variable while fixing other variables to evaluate its impact on attack performance. Different from other studies on deep learning apps [li2021deeppayload, huang2021robustness], our experiments are based on a larger dataset and generate a considerable number of adversarial examples for the victim app in each experiment. Considering the workload of training, we select 10 apps that are also used by [huang2021robustness] to evaluate our approaches. Since the success rates of attacks are based on these apps, we do not attempt to generalize our results to all deep learning apps.
Our study has two limitations. The main limitation is that our approach only suitable for non-obfuscated apps. To get the teaching dataset from a teacher model inside the app, we have to locate and learn how the model is loaded and used in the app. However, if protection techniques, such as code obfuscation, are applied to the app, it can be hard to find the API patterns. Another limitation is that the pipeline developed in this paper focuses on computer vision apps. On the one hand, the existing studies [huang2021robustness, li2021deeppayload] on adversarial attacks are mainly in this field. It can be easy to compare our work with other approaches. On the other hand, adversarial attacks in other deep learning fields lack a widely accepted evaluation standard. Without widely adopted and convincing criteria, it can be unfair to compare the results of our approach with others [dong2021sentence, wang2019natural]. Thus, we only consider computer vision apps in our experiments.
V-C Consequences of Proposed Attacks
From the perspective of a malicious attacker, the adversarial examples can threaten users’ property, privacy, and even safety. Dong et al. [dong2019efficient] applied adversarial examples to a real-world face recognition system successfully. As a popular way to unlock devices, the breach of the face recognition system means users’ privacy is at risk of leakage. Although it is not easy to use adversarial examples in the real world, relevant research has made progress [sun2018survey, Boloor2019simple, Xu2020what]. The threat of adversarial examples is worthy of attention.
According to the study by Sun et al. [Sun2020MindYW], only 59% of deep learning apps have protection methods for their models. To fill this gap, we propose several protection methods for both deep learning models and deep learning apps.
Deep learning model protection scheme. To protect the deep learning model inside an app from black-box attacks, there are two practical solutions:
Using self-developed deep learning models instead of open-source models can reduce the probability of being attacked. For a self-developed deep learning model, it can be hard to find or train a qualified student model to imitate the teacher model. Without a qualified student model, it can be hard to reach a high attack success rate; and
Training the deep learning model on a private dataset instead of a public dataset can reduce the success rate of being attacked. The size of the teaching dataset is critical for training a qualified student model (See Sec. IV-C). If the deep learning model inside the app is trained on a public dataset, the attacker can find the same dataset with ease. A student model can gain a powerful imitating ability by being trained on the same dataset as the teacher model. As a result, the adversarial examples generated on the student model can spoof the teacher model with a high success rate.
Deep learning app protection scheme. Using a self-developed deep learning model and collecting a private dataset is time-consuming and costly. There are also some common techniques to protect deep learning apps from being attacked [Sun2020MindYW, li2021deeppayload, huang2021robustness]:
Code obfuscation can prevent attackers from finding out how the apps invoke the deep learning models. As a result, attackers cannot generate a labeled teaching dataset by instrumenting a DummyActivity. Thus, the attack method degenerates to the blind attack with a low attack success rate;
Hash code can prevent attackers from modifying the model inside the victim app. Li et al. [li2021deeppayload] use images with special patterns to fool the models by adding specific layers into deep learning models inside apps. With hash code, such backdoor attacks can be detected with ease; and
Encryption can prevent attackers from knowing the structure and weights of the model. It is a fundamental method to protect the deep learning model inside the app. Without the knowledge of the victim model’s structure and weights, attackers can only rely on black-box attack methods, which can be time-consuming (i.e., finding dataset, training the student model).
Vi Related Work
Vi-a Substitute-based adversarial attacks
To overcome black-box attacks’ unreachability to the internals of victim deep learning models, attackers can train a local model to mimic the victim model, which is called substitute training. With the in-depth study of black-box attacks on deep learning models, substitute-based adversarial attacks have received a lot of attention. Cui et al. [cui2020substitute] proposed an algorithm to generate the substitute model of CNN models by using knowledge distillation and boost the attacking success rate by 20%. Gao et al. [gao2019boosting]
integrated linear augmentation into substitute training and achieved success rates of 97.7% and 92.8% in MNIST and GTSRB classifiers. In addition, some researchers studied substitute attacks from a data perspective. Wang et al.[Wang2021delving] proposed a substitute training approach that designs the distribution of data used in the knowledge stealing process. Zhou et al. [Zhou2020dast]
proposed a data-free substitute training method (DaST) to obtain substitute models by utilizing generative adversarial networks (GANs). However, the research on how to apply substitute-based adversarial attacks to the apps and the analysis of their performance is lacking. This paper fills the blank in this direction.
Vi-B Deep Learning Model Security of Apps
Previous works on the security of deep learning models in mobile apps mainly focus on how to obtain information about the structure and weights of the model. Sun et al. [Sun2020MindYW] developed a pipeline that can analyze the model structure and weights and revealed that many deep learning models can not be extracted directly from mobile apps by decompiling the apks. Huang et al. [huang2021robustness] developed a pipeline to find the most similar model on TensorFlow Hub and then use it to generate adversarial examples. However, they still need to know the information of layers in the model to calculate similarity, which is used to find the most similar model on TensorFlow Hub for launching attacks. Different from Huang et al.’s work [huang2021robustness], we do not require the knowledge of the structure and weights of the model. Li et al. [li2021deeppayload] proved the possibility to perform backdoor attacks on deep learning models and succeeded to use images with unique patterns to fool the models. To perform such attacks, Li et al. [li2021deeppayload] modified the internal structure of the model inside the app. Whereas, as a black-box approach, we do not need to alter anything inside the deep learning model. Different from existing works, our work focuses on investigating the security of the deep learning models in mobile apps in a complete black-box manner. Our work offers a new perspective and shed the light on security research in deep learning models for mobile.
Vi-C Adversarial Attacks and Defenses to Deep Learning Model
Adversarial attacks show their power in spoofing deep learning models related to computer vision. Researches on image classification attack methods account for a large part. The most popular technique is adversarial image perturbations (AIP) [oh2017adversarial]. Dai et al. [dai2018adversarial]
developed an attack method based on genetic algorithms and gradient descent, which demonstrates that the Graph Neural Network models are vulnerable to these attacks. Stepan Komkov and Aleksandr Petiushko[komkov2019advhat] proposed a reproducible technique to attack a real-world Face ID system. Baluja et al. [baluja2018learning]
developed an Adversarial Transformation Network (ATN) to generate adversarial examples. It reaches a high success rate on MNIST-digit classifiers and Google ImageNet classifiers.
As adversarial attacks pose a huge threat to deep learning models, researches on how to protect models from such attacks have also drawn wide attention. Li et al. [li2020defense] presented the gradient leaking hypothesis to motivate effective defense strategies. Liao et al. [liao2018defense] proposed a high-level representation guided denoiser (HGD) as a defense for deep learning models related to image classification. Zhou et al. [zhou2020adversarial] proposed a defense method to improve the robustness of DNN-based image ranking systems. Cissé et al [moustapha2017parseval] proposed Parseval networks to improve robustness to adversarial examples. Alexandre et al. [araujo2020robust] used a Randomized Adversarial Training method to improve the robustness of deep learning neural networks.
However, even though there are a lot of researches on the adversarial attack and defense methods of deep learning models, research on this topic about mobile apps is scant. Our work proves that a black-box attack on the deep learning model inside the apps is feasible and provides a new perspective to evaluate the robustness of deep learning apps.
In this paper, we propose a practical black-box attack approach on deep learning apps and develop a corresponding pipeline. The experiment on 10 apps shows that the average attack success rate reaches 66.60%. Compared with existing adversarial attacks on deep learning apps, our approach outperforms counterparts by 27.63%. We also discuss how student model structure, teaching dataset size, and hyper-parameters of attack algorithms can affect attack performance.