However, there are two main challenges. (i) MLaaS raises safety and privacy concerns on sensitive data such as patient treatment records. Even though the DNN model structures are in black-box, MLaaS can leak sensitive information about training data used to build back-end models . For instance, membership inference attack (MIA)  is one of the critical inference attacks in exploiting the aforementioned vulnerability. By using MIA, the adversary monitors the distinctive behavior of back-end models by repeating sophisticated designed inference requests to further exploit information about the training data. (ii) DNN models are evolving fast in order to satisfy the diverse characteristics of broad applications. As the layers of DNNs get deeper and model size of DNNs gets larger (e.g., ResNet-152  with 152 layers and 11.3 billion FLOPs), the high computation and large model size introduce substantial data movements, limiting their ability to provide a user-friendly experience on resource-constrained edge devices [1, 11, 2].
To address the MIA challenge, several mechanisms have been developed. Differential privacy (DP), a major privacy-preserving mechanism against general Inference attack which is based on adding noises into gradients or objective function of training model, has been applied in different machine learning models [12, 13, 14, 15, 16]
. Although the robustness has been proven, The utility cost (e.g., creating indistinguishable non-membership datasets, calculating a bound for the function sensitivity) of DP is hard to be limited as acceptable since it imposes a significant accuracy loss for protecting complicated models as well as on high dimensional data when noise is large. Another defense mechanism is game theory, e.g., Min-Max game, which guarantees the information privacy. The maximum gain of inference model is considered as a a new regularization calledadversarial regularization
and will be minimized with training model loss. Unfortunately, Min-Max game introduces extra computational costs in addition to the classifier training process. Finally yet importantly, neither DP nor Min-Max game addresses the second challenge, i.e., high computation and large model size in DNNs.
In this work, in order to simultaneously address the two challenges, we design and implement an innovative MIA defense method that is optimized for the dual objectives of privacy and efficiency. We show that an effective DNN model compression technique helps against MIA while simultaneously achieving model storage and computational complexity reduction within very small accuracy loss. We present the main contributions of our work:
As the first attempt to simultaneously address the challenges of large model size, high computational cost, and vulnerability against MIA on DNNS, we jointly formulate model compression and MIA as MCMIA, and provide an analytic method of solving the problem.
We investigate the MCMIA–Pruning to evaluate if model compression has the same effectiveness as Min-Max game, i.e., reduce attack accuracy. We provide the attack and testing accuracy of baseline (without defense and pruning) and MCMIA–Pruning. Experimental results show that the attack accuracy using pruning is 13.6%, 3%, 3.77%, 9.1%, 3.48%, 2.11%, 5% lower than the attack accuracy of baseline, for LeNet-5 on MNIST, VGG16 on CIFAR-10, MobileNetV2 on CIFAR-10, VGG16 on CIFAR-100, MobileNetV2 on CIFAR-100, MobileNetV2 on ImageNet, ResNet18 on ImageNet, respectively.
We verify that model compression performs better than Min-Max game, i.e., further reduce attack accuracy. Experimental results show that the attack accuracy using pruning is 2.6%, 1.34%, 10% lower than the attack accuracy of baseline, for LeNet-5 on MNIST, MobileNetV2 on CIFAR-10, VGG16 on CIFAR-100, respectively.
We further investigate the combination of model compression and Min-Max game and show that the combination will maximally enhance DNN model privacy, and formulate it as the MCMIA–Pruning & Min-Max. Experimental results show that MCMIA–Pruning & Min-Max achieves 3.03% and 1% further lower than MCMIA–Pruning only, for CIFAR100 on VGG16, and LeNet-5 for MNIST, respectively.
Experimental results show that our MCMIA model can reduce the information leakage from MIA. Our proposed method significantly outperforms DP on MIA. Thanks to the hardware-friendly characteristic of model compression, our proposed MCMIA is especially useful in deploying DNNs on resource-constrained platforms in a privacy-preserving manner.
Ii Related Work and Background
Ii-a DNN Model Compression (MC)
State-of-the-art (SOTA) DNNs contain multiple cascaded layers, and at least millions of parameters (i.e., weights) for the entire model [1, 17, 18, 2, 3]. The large model size and computational cost limit their ability to provide a user-friendly experience, especially on resource-constrained platforms [1, 11]. To address the challenges, prior works have focused on developing DNN model compression algorithms such as weight pruning [19, 20, 21, 22, 23]
(i.e., removing weights with specific dimensions or with any desired weight matrix shapes) utilizing different regularization techniques to explore sparsity. The key idea is to keep the critical weights and develop optimization techniques to regularize the loss function to maintain model accuracy, to represent a neural network with a much simpler model. On the other hand, a simpler model brings acceleration in computation and reduction in weight storage, hence achieving fast training and inference speed.
ADMM-based DNN Model Compression: Recent works [22, 24] have shown by incorporating alternating direction method of multipliers (ADMM) into DNN model compression, one can achieve high weight reduction ratio while maintaining the accuracy. Considering an optimization problem with combinatorial constraints, which is difficult to solve directly using optimization tools . By using ADMM , the problem can be decomposed into two subproblems on and (auxiliary variable), i.e., the first subproblem derives given : ; the second subproblem derives given : . Both and are quadratic functions. In such way, the two subproblems could be solved separately and iteratively until convergence. Originally, ADMM is used to accelerate the convergence of convex optimization problems and enable distributed optimization, where the optimality and fast convergence rate have been proven [26, 25]. One special property of ADMM is that it can effectively deal with a subset of combinatorial constraints and yields optimal (or at least high quality) solutions [27, 28]. The related constraints in DNN model compression belong to this subset of combinatorial constraints, therefore ADMM is applicable to DNN mode compression. Consider the -th layer in an -layer DNN (containing both convolutional and fully connected layers), the weights and bias can be represented by and . The overall DNN model compression problem is given by: subject to , where is the loss function of DNN model, , and is the specified number of weights in the -th layer. According to [22, 25], the problem can be rewritten as , subject to , where is an auxiliary variable. With formation of augmented Lagrangian , this problem can be decomposed into two subproblems.
Ii-B Membership Inference Attack (MIA)
In reality, users are usually unwilling to share data for privacy concerns. Especially in the medical field, sharing private patient information is prohibited by law or regulation. Given an input, the adversary goal in MIA is to determine whether it belongs to the training dataset . If the attacker can determine a given input belonging to the training data correctly, it is an information leakage.
As the first work on using MIA against machine learning models, Shokri et al.  used different neural networks as attack models that take the prediction from the target model to determine whether a data record is from the training set of the target model. Since the target model is a black-box API, Shokri et al. proposed to construct multiple shadow models to mimic the target model’s behavior and derive the data necessary, i.e., the posteriors and the ground truth membership, to train attack models. Nasr et al.  introduced a privacy mechanism to train machine learning models such that the predictions on its training data are indistinguishable from its predictions on other data points from the same distribution. Salem et al.  adopted one shadow model instead of multiple ones to duplicate the behavior of the target model. Detailed experiments including eight datasets covering images to text were performed on various DNN models to demonstrate the adversary can achieve similar accuracy as Shokri et al.  with the proposed one shadow and one attack model.
Ii-C Defense mechanism against MIA
. Most of game theory based mechanisms minimize the privacy loss against the strongest attacker by converting the utility function into min-max optimization problem. After Generative Adversarial Network (GAN) being proposed by, some new algorithms for solving min-max problem while training DNN model. For instance, using a similar framework as GAN, Nasr et al.  proposed a Min-Max game mechanism and formulated the gain of MIA as a new regularization, which is maximized while the classifier prediction’s loss is minimized. We use it as a comparison with our experimental results.
DP is another major defense mechanism against MIA. There are multiple DP based defense mechanisms [37, 15, 38], by adding noises into gradients or objective function of training model. However, the existing mechanisms would impose a significant accuracy loss for protecting complicated models as well as on high dimensional data when the noise parameter is large. Differential privacy mechanisms are difficult to achieve with negligible utility loss, where utility loss is related to creating same distribution’s state of all input data, and also computing the gradient noise with a narrow bound. There are some other defense directions. For example, model stacking  mechanism made a combination of multiple classifier results to prevent the attacker from inferring a single target classifier. Dropout 39] mechanism randomly added noise on the target classifier prediction.
The existing defenses have at least one of the limitations as following: 1) they have typical extra computations, such as extra weight storage and noise calculations. That means these mechanisms introduce extra computational costs in addition to the training approaches. 2) they achieve privacy protection with significant utility loss.
Iii MCMIA: Problem Statement
In this work, we investigate the following question: Will an effective DNN model compression technique help against MIA while simultaneously achieving model storage and computational complexity reduction within very small accuracy loss? We start with formulating the joint problem of model compression and MIA.
We consider the MIA problems in a black-box condition, which means the adversary can only observe the input and output of the model with input dataset. Figure 1 shows an illustrative diagram of using model compression against MIA in DNNs. We use to denote the adversarial inference model . takes the feature of data donated as , the label of data donated as , and the prediction of classification model donated as as inputs and outputs the probability of belonging to member of the training set . We use to denote the conditional probability of being a member of and use to denote non-member examples from non-training set . When the conditional probability is known, we can formulate the gain function for MIA as follows:
Iii-a Problem Formulation
We consider the following MIA assumption: the adversary has access to obtain a data record and can obtain the prediction from the black-box DNN target model. Based on the difference of model prediction’s distribution with membership dataset and non-membership datasets, the adversary will determine whether the data record belongs to the model’s membership dataset or not. Furthermore, the adversary tries to maximize the accuracy of its determination.
We argue that model compression can be used against the MIA, by pruning the model weights to build a defense system, so that the model prediction for membership (training dataset) and non-training dataset are distinguishable. In this case, it becomes more difficult for the adversary to determine where the observed data record belongs to. Finally, the risk of membership privacy loss is reduced. Ideally the adversary can only make a determination by random guess. At the same time, the classification accuracy of the model will not be or slightly be affected. In other words, the utility cost of defense (e.g., classification accuracy loss) is negligible.
In our model, we first use ADMM-based model compression to systematically pruning the DNN weights, under the condition of maximizing the adversary gain , then we minimize the classification loss function as a trade-off between the privacy and classification accuracy. We initially formalize the MCMIA problem as
and are the model compression projections to constrain the possibility that adversary can make correct determination in a certain boundary. For the further step, we consider the Min-Max game  to strengthen our MCMIA–Pruning, and the corresponding optimize problem would become
where is a constant, as an adversarial regularization factor.
Iii-B Problem Analysis
By pruning the weight of the training model systematically, the output distribution can not be distinguished from the training dataset or none-training dataset. In other words, model compression reduces the gain of the adversary. Meanwhile, we have a ”free lunch”, i.e., we can simultaneously achieve model storage and computational complexity/cost reduction within very small accuracy loss.
According to , the gain of the adversary can be written as
is one data record from and is data record from . and
are the probability distribution of the model’s output on training data records and not-training data records, respectively.
Consider an image from training dataset, the difference between the inference model’s determination of is from training dataset and non-training dataset can be written as
where d is the probability difference between adversary’s binary determination. Ideally, if a model can be totally protected from MIA, the inference model can only flip a coin to make the determination with the possibility of 0.5, which means
In other word, .
In the proposed MCMIA: Min-Max game, given the best strategy of adversary against any classifier, we design the model compression mechanism as the best response to MIA. After using model compression on the classification model, the corresponding constrains inference model determination becomes:
where is corresponding to the classification model with model compression. The gap between prediction distribution of training/not-training dataset is:
By systematically pruning the weight in steps , the distribution of and will be ’nearly identical’. become smaller. Finally, we can obtain an ’near-perfect’ to against the inference model. And it is equivalent with close to 0.5. In this case, the optimal inference model can only flip a coin to guess if the data record is from training dataset. The MCMIA can successfully prevent the leakage of the training data information.
To summarize, our model is an MC-conditional classification under minimum classification loss. It can completely constrain the gain of MIA, which means the adversary can not distinguish the training data record and non-training record from the model’s input data.
Iv MCMIA: Methodology
Iv-a Unified Problem Reformulation of MCMIA
The total loss of MCMIA can be formulated as
where is the cross-entropy loss, is the gain function of the MIA, , as a constant, is the coefficient value of the gain function. More specifically, the gain function is
We define , so that the Lagrangian format above can be rewritten as,
We summarize MCMIA problem as the following
Iv-B Solution Strategy
In our algorithm, we systematically solve the reformulated problem by satisfying the following constraints,
The Augmented Lagrangian format can be decomposed into solving the following problems and the parameters are updated repeatedly as follows
The sub-problem of optimization can be written as
Since is the indicator function of the set , the globally optimal solution of the problem can be explicitly derived as :
where denotes the Euclidean projection onto the set .
V-a Experimental Setup
To evaluate our proposed method, we apply MCMIA on different DNN models including LeNet-5 , VGG16 , MobileNetV2 , ResNet18  on different datasets (e.g., MNIST , CIFAR-10 , CIFAR-100 , ImageNet 
). We use LeNet-5 on MNIST dataset. On CIFAR-10 and CIFAR-100 dataset, we use VGG-16, MobileNetV2 and ResNet18 models to evaluate the prediction accuracy. We also use MobileNetV2 and ResNet18 models on ImageNet dataset to show the scalability of our proposed method. LeNet-5 is a classical convolution neural network with one input layer, two convolution layers with kernel sizex and x respectively, followed by an average pooling layer, two fully-connected layers and one output layer. VGG-16 is a standard convolution neural network with 13 convolutional layers of kernel size x followed by 2 fully-connected layers and 1 softmax output layer. MobileNetV2 is a convolution neural network contains the initial fully convolution layer with 32 filters, followed by 19 residual bottleneck layers. ResNet18 is a standard residual network, consisting of 8 residual convolution blocks followed by an average pooling layer of size x and a fully-connected layer.
For comparison with Min-Max game, we use Min-Max game in the experimental setup above, since it is robust to different attacks meanwhile has limited accuracy loss of the targeted model. We also include a brief comparison between DP and MCMIA on CIFAR-10 and MNIST datasets.
For comparison with DP, in CIFAR-10 dataset, We followed the same architecture with the four layer(two convolution layers and 2 fully connected layers) CNN classification model in  and compare our results with the reported results in . For MNIST datasets, we use LeNet-5 as the classification model and implement DP which reaches the optimal solution with the noise parameter as 6.28.
The MCMIA training process is shown in Algorithm 1. Algorithm 1 shows the pseudo-code that the ADMM model compression work on training classifier against the MIA model . For every epoch, out of the iterations step, ADMM model compression systematically prunes the weight of updating classifier follow the solution strategy. In the iterations step, for a fixed training classifier, the MIA model is trained to distinguish the prediction of the classifier from training and non-training datasets. Inner iterations steps, the MIA model is trained to distinguish the classifier ’s prediction of inputs from training dataset and non-training dataset .
V-B Inference Attack Model
. The inference attack model is composed of three fully-connected sub neural networks in a hierarchical structure. The prediction vectorand the targeted label are fed into two sub-networks in the first level in parallel, and the processed representations of the two sub-networks are then concatenated and fed into the third sub-network on the second level to make the final prediction. The architectures of the two sub-networks for processing and are and respectively. The architecture of the third sub-network is . We use Adam optimizer with learning rate . For CIFAR-10-CNN in Table I, the inference attack model consists of one fully-connected layer, the same as in .
V-C Evaluation Results on MCMIA–Pruning
V-C1 MNIST, CIFAR-10 and CIFAR-100
We compare model compression (using MCMIA–Pruning) and Min-Max game to investigate if model compression can constrain the maximum gain of the inference model, i.e, further reduce attack accuracy. We provide the attack accuracy and testing accuracy of baseline (without defense and pruning), MCMIA–Pruning, and Min-Max game as shown in Table II. On MNIST, experimental results demonstrate that for LeNet-5, the attack accuracy using MCMIA–Pruning is 13.6% lower than the attack accuracy of baseline, and is 2.6% lower than the attack accuracy of Min-Max game. From the comparison between DP and MCMIA shown in Table I, MCMIA achieves 25.13% lower attack accuracy than DP and with 2.66% higher testing accuracy of the classification model.
On CIFAR-10, with experimental results demonstrate that for VGG16, the attack accuracy using MCMIA–Pruning is 3% lower than the baseline attack accuracy and is 1.34% lower than the attack accuracy of Min-Max game. On the other hand, for MobileNetV2, the attack accuracy using MCMIA–Pruning is 3.77% lower than the baseline attack accuracy, and is close to Min-Max game. As shown in Table I, on a 4 layer CNN , MCMIA has 1% lower attack accuracy with DP, while MCMIA has 7.36% higher testing accuracy of the classification model than DP. On CIFAR-100, with the experimental results demonstrate that for VGG16, the attack accuracy using MCMIA–Pruning is 9.1% lower than the baseline attack accuracy, and is approximately 10% lower than the Min-Max game. On the other hand, for MobileNetV2, the attack accuracy using MCMIA–Pruning is 3.48% lower than the baseline attack accuracy and is close to Min-Max game.
|MCMIA–Pruning||MCMIA–Pruing & Min-Max|
The results indicate that using model compression can help against MIA and model compression is more effective than using Min-Max game. On the other hand, model compression have significantly less utility cost then DP. And our experiment also shows that DP is hard to achieve privacy-preserving with negligible utility loss. Also base on the experiment in , to achieve the same level of attack accuracy, the test accuracy under the DP method is under 70% in the best case, 25% in the worst case on CIFAR-10 by different noise parameter . In addition, model compression brings another benefit shown in Table V, i.e., we achieve 15.78X model size reduction for LeNet-5 on MNIST, at least 10.06X model size reduction for on CIFAR-10/CIFAR-100 among VGG16, MobileNetV2, and ResNet18, which is extremely helpful for deploying DNNs on resource-constrained edge devices. Figure 3 (a)-(c) show the weight distributions in different classification models from baseline, MCMIA, and Min-Max game. We can observe that after pruning, the weights are much less than the baseline model and Min-Max game model (both without pruning).
Next, we investigate classification loss of baseline (without pruning and defense), MCMIA–Pruning, and Min-Max Game. Taking CIFAR-10-VGG16 as an example, Figure 2 shows the classification loss of baseline, MCMIA–Pruning and Min-Max Game respectively in the upper row. The classification loss of MCMIA converges rapidly in less than 20 epochs. In addition, it has the highest final classification loss when the model is fully trained. In other words, MCMIA prevents overfitting instead of reducing the classification loss on training data arbitrary low. We train the membership inference model based on the predicted outputs of the well-trained classification model. We plot the testing accuracy of membership inference attack during the inference model training process in the lower row in Figure 2. The adversary attack accuracy is measured by the average of adversary’s correct determination percentage among all adversary determination for the observed data records .
|MNIST||LeNet||60 K||3.80 K||15.78 X|
|CIFAR-10/100||VGG16||13.83 M||1.08 M||12.8 X|
|ResNet18||11.17 M||1.06 M||10.54 X|
|MobileNetV2||3.46 M||0.34 M||10.06 X|
|ImageNet||ResNet18||11.17 M||3.47 M||3.37 X|
|MobileNetV2||3.46 M||1.06 M||3.27 X|
The experimental results for ImageNet are shown in Table III, which demonstrate that for MobileNetV2, the attack accuracy using MCMIA–Pruning is 2.11% lower than the baseline, then, for ResNet18, the attack accuracy using pruning is approximately 5% lower than the baseline. The weight reduction ratio is 3.37X for ResNet18 and 3.27X for MobileNetV2 compare with the baseline weights.
V-D Evaluation Results on MCMIA–Pruning & Min-Max
The experimental result for the MCMIA–Pruning & Min-Max is showed in Table IV. The experiment results demonstrate that for CIFAR-10-VGG16, the attack accuracy of Pruning &Min-Max is 55.93%, which is 3.03% lower than the attack accuracy of MCMIA–Pruning. And for CIFAR-100-VGG16, the attack accuracy of pruning & Min-Max is 57.65%, which is 1% lower than the attack accuracy of MCMIA–Pruning. For MNIST-LeNet-5, the attack accuracy of pruning & Min-Max is close to the attack accuracy of MCMIA–Pruning. Figure 3 (d) shows the distribution of weights in classification models from MCMIA–Pruning & Min-Max. We can also observe that after pruning, the weights are much less than the baseline model.
V-E MCMIA Analysis
In general, for the same type of model, the more overfitting the model is, the more vulnerable it is to MIA. The least generic the distribution of training data is, the more information it leaks. MCMIA achieves parameter sparsity by pruning non-critical weights, thus can potentially reduce the overfitting caused by over parameterization. Taking CIFAR-10-VGG16 as an example, we compare the prediction on training data and non-training data among baseline, MCMIA–Pruning and Min-Max Game, showing in Figure 4. Baseline has high probability for its correct class in the training data, while predicts less high probability in the testing data. Such difference makes it vulnerable to MIA. However, MCMIA–Pruning and Min-Max Game have relatively similar predicted probability between training and testing data. To summarize the difference of prediction between training and non-training data quantitatively, we plot the MIA accuracy along with the difference of classification accuracy between training and non-training data for each class in CIFAR-10 in Figure 5. We name such difference as training/non-training accuracy gap. Shown in Figure 5, there is a trend that the larger the training-non-training accuracy gap is, the higher the membership attack accuracy is. Among all the four methods, MCMIA–Pruning & Min-Max achieves the lowest train-test accuracy gap and lowest membership inference attack accuracy, therefore providing the highest privacy enhancement. The comparison of overall training accuracy, testing accuracy, and membership inference attack accuracy is illustrated in Table VI, which conveys similar messages as Figure 5.
In this work, we jointly formulate model compression and MIA as MCMIA, and provide an analytic method of solving the problem. We evaluate our method on LeNet-5, VGG16, MobileNetV2, ResNet18 on different datasets including MNIST, CIFAR-10, CIFAR-100, and ImageNet. From experimental results, we see model compression can significantly reduce the information leakage from MIA. Our proposed method outperforms DP on MIA. Compared with our MCMIA–Pruning, our MCMIA–Pruning & Min-Max game can achieve the lowest attack accuracy, therefore maximally enhance DNN model privacy. Thanks to the hardware-friendly characteristic of model compression (reducing weight storage and computational cost), our proposed MCMIA is very helpful for deploying DNNs on resource-constrained edge devices. We hope our proposed method will shed some light on the increasing membership privacy concerns when applying DNNs on user-sensitive data such as business and medical datasets, in the era of edge computing.
-  Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. (2012) 1097–1105
-  He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.
-  Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems. (2017) 5998–6008
-  Ribeiro, M., Grolinger, K., Capretz, M.A.: Mlaas: Machine learning as a service. In: 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), IEEE (2015) 896–902
-  Kurniawan, A.: Learning AWS IoT: Effectively manage connected devices on the AWS cloud using services such as AWS Greengrass, AWS button, predictive analytics and machine learning. Packt Publishing Ltd (2018)
-  Gollob, D.: Microsoft Azure-Planning, Deploying, and Managing Your Data Center in the. Springer-verlag Berlin And Hei (2015)
Fan, X., Iacob, M., Nicolae, M., Dong, E.:
Machine learning basics with ibm data science experience.In: Proceedings of the 27th Annual International Conference on Computer Science and Software Engineering, IBM Corp. (2017) 340–340
-  Ravulavaru, A.: Google Cloud AI Services Quick Start Guide: Build Intelligent Applications with Google Cloud AI Services. Packt Publishing Ltd (2018)
-  Truex, S., Liu, L., Gursoy, M.E., Yu, L., Wei, W.: Demystifying membership inference attacks in machine learning as a service. IEEE Transactions on Services Computing (2019)
-  Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), IEEE (2017) 3–18
-  Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29(6) (2012) 82–97
-  Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., Zhang, L.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. (2016) 308–318
-  Bassily, R., Smith, A., Thakurta, A.: Private empirical risk minimization: Efficient algorithms and tight error bounds. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, IEEE (2014) 464–473
-  Zhang, X., Huang, C., Liu, M., Stefanopoulou, A., Ersal, T.: Predictive cruise control with private vehicle-to-vehicle communication for improving fuel consumption and emissions. IEEE Communications Magazine 57(10) (2019) 91–97
-  Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. Journal of Machine Learning Research 12(Mar) (2011) 1069–1109
-  Rahman, M.A., Rahman, T., Laganière, R., Mohammed, N., Wang, Y.: Membership inference attack against differentially private deep learning model. Transactions on Data Privacy 11(1) (2018) 61–79
-  Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2015) 3128–3137
-  Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
-  Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems. (2015) 1135–1143
-  Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Advances in Neural Information Processing Systems. (2016) 2074–2082
-  Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient dnns. In: Advances In Neural Information Processing Systems. (2016) 1379–1387
-  Zhang, T., Ye, S., Zhang, K., Tang, J., Wen, W., Fardad, M., Wang, Y.: A systematic dnn weight pruning framework using alternating direction method of multipliers. In: Proceedings of the European Conference on Computer Vision (ECCV). (2018) 184–199
-  Xiao, X., Wang, Z., Rajasekaran, S.: Autoprune: Automatic network pruning by regularizing auxiliary parameters. In: Advances in Neural Information Processing Systems. (2019) 13681–13691
-  Ren, A., Zhang, T., Ye, S., Li, J., Xu, W., Qian, X., Lin, X., Wang, Y.: Admm-nn: An algorithm-hardware co-design framework of dnns using alternating direction methods of multipliers. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. (2019) 925–938
-  Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3(1) (2011) 1–122
-  Ouyang, H., He, N., Tran, L., Gray, A.: Stochastic alternating direction method of multipliers. In: International Conference on Machine Learning. (2013) 80–88
-  Hong, M., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM Journal on Optimization 26(1) (2016) 337–364
Liu, S., Chen, J., Chen, P.Y., Hero, A.:
Zeroth-order online alternating direction method of multipliers:
Convergence analysis and applications.
In: International Conference on Artificial Intelligence and Statistics. (2018) 288–297
-  Nasr, M., Shokri, R., Houmansadr, A.: Machine learning with membership privacy using adversarial regularization. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. (2018) 634–646
-  Salem, A., Zhang, Y., Humbert, M., Berrang, P., Fritz, M., Backes, M.: Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. arXiv preprint arXiv:1806.01246 (2018)
-  Alvim, M.S., Chatzikokolakis, K., Kawamoto, Y., Palamidessi, C.: Information leakage games. In: International Conference on Decision and Game Theory for Security, Springer (2017) 437–457
Hsu, J., Roth, A., Ullman, J.:
Differential privacy for the analyst via private equilibrium
In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing. STOC ’13, New York, NY, USA, Association for Computing Machinery (2013) 341–350
-  Manshaei, M.H., Zhu, Q., Alpcan, T., Bacşar, T., Hubaux, J.P.: Game theory meets network security and privacy. ACM Computing Surveys (CSUR) 45(3) (2013) 1–39
-  Shokri, R.: Privacy games: Optimal user-centric data obfuscation. Proceedings on Privacy Enhancing Technologies 2015(2) (2015) 299–315
-  Shokri, R., Theodorakopoulos, G., Troncoso, C., Hubaux, J.P., Le Boudec, J.Y.: Protecting location privacy: optimal strategy against localization attacks. In: Proceedings of the 2012 ACM conference on Computer and communications security. (2012) 617–627
-  Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems. (2014) 2672–2680
-  Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference, Springer (2006) 265–284
-  Iyengar, R., Near, J.P., Song, D., Thakkar, O., Thakurta, A., Wang, L.: Towards practical differentially private convex optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), IEEE (2019) 299–316
-  Jia, J., Salem, A., Backes, M., Zhang, Y., Gong, N.Z.: Memguard: Defending against black-box membership inference attacks via adversarial examples. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. (2019) 259–274
-  LeCun, Y.: Lenet-5, convolutional neural networks. URL: http://yann. lecun. com/exdb/lenet (2015)
-  Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR). (2015)
-  Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. (2018) 4510–4520
-  Deng, L.: The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine 29(6) (2012) 141–142
Krizhevsky, A., Hinton, G., et al.:
Learning multiple layers of features from tiny images.
Technical report, Citeseer (2009)