Advances in Machine learning (ML) techniques are nowadays enabling highly accurate predictions, analytics extraction, classification and recommendation tasks for a wide range of applications. However, the success of these models is largely dependent on the access to well-provisioned computationally powerful platforms as well as the availability of a substantial amount of data used for model training. Third party Machine-Learning-As-A-Service (MLaaS) providers like Google and Amazon have successfully addressed these two challenges by providing publicly accessible high-performance computing infrastructure combined with ML algorithms trained on enormous data. Working as a black-box API, MLaaS enables users to upload their own dataset to the server and get a trained ML model on it. Organisations, public and private alike, as well as data scientists and researchers, are now using the MLaaS platforms to get insights on data collected from a vast range of sources.
The availability of such services raises certain safety and privacy concerns, as for many domains, the use of sensitive data in learning is inevitable. For example, social media researchers are utilizing ML in analyzing human behaviours through massive social media data [liu2019socinf]. Similarly, medical records are analyzed by vendors who provide healthcare or insurances to verify the likelihood of certain health conditions [12rahman2018membership]. In certain scenarios, biomedical data [backes2016membership] and location data [pyrgelis2017knock] are also deemed to be sensitive in nature, and therefore involve data privacy issues.
Although the structure of the learning model is generally hidden, as a user trains the model in provider’s server, the centralized server presumably has access to the records it was trained on, which could potentially be misused if the presence of a record in the training data is exposed. In addition to that, some services give flexibility to the data owner to extend a pay-per-query model, where other users can query the owner’s learning model [tramer2016stealing]. Because of the publicly available querying platform, adversarial attacks are feasible by probing the outcome of the model to gain knowledge of the model structure or about some users’ records [10shokri2017membership, 18truex2018towards, attriguard, veale2018algorithms, liu2019socinf, hisamoto2019membership]schonherr2018adversarial, papernot2016limitations].
Membership Inference Attack (MIA) [10shokri2017membership] is one of the highly regarded critical inference attacks that can achieve a substantial amount of successes on ML models even when the model structure is not shared. Successful MIA can pose a severe threat to user privacy by identifying the data of a particular user contained in a dataset. For instance, knowing if a person’s record was part of the data used to analyse suicidal behaviours [liu2019socinf] or movements of Alzheimer patients [pyrgelis2017knock] reveals information about the person’s suicidal tendency and health condition respectively.
Hence, it is crucial to understand the reasons behind a successful MIA on a model and the possible information leakage through the attack. Although, many pieces of research [salem2018ml, nasr2018machine, 17hayes2019logan]
have established a link between MIA and certain ML model properties (e.g. overfitting, choice of hyperparameters), studying the effect of other data and model properties is yet to be done. This research aims at protecting user data privacy against MIA by establishing a relation between ML models and MIA by measuring MIA’s effectiveness for multiple data and model properties. Furthermore, this research illustrates ways to improve robustness of the model against MIA by using the model properties as regularizers in the model.
Contributions: Most of the solutions to improve machine learning models’ resilience against MIA consider reducing the overfitting of the model to make it resistant to such attacks. However, in addition to the model’s overfitting, there are several other underlying data and model properties that could contribute to the success of these attacks. For example, data properties such as data size, balance in the classes and features, and entropy, as well as model characteristics such as group, predictive and individual fairness and MIA-indistinguishability might play an important role in determining the success rate of MIA. No comprehensive study has been conducted so far in the literature to investigate which of these factors significantly impact the attack accuracy, so that defense methods can be developed to prevent MIA considering those factors. In this paper, we conduct an exploratory analysis of different data and model properties’ influence in the success of MIA and provide recommendations to develop defence methods in order to improve models’ resistance based on the findings of our study. In summary, our contributions are:
Identifying the correlation of different data and model properties with MIA’s success and their potential impact: Some of the properties show strong positive correlation with MIA (for example, fairness of the model), while others show negative correlations (for example, balance in the classes). However, for a few of the properties, such as, number of the features and entropy, we could not find any straightforward correlations with MIA, which we intend to explore further in future.;
Minimizing information leakage in ML models by reducing MIA accuracy based on these findings: we propose to use influential model properties such as model’s fairness and mutual information between the records and the model parameters as regularizers in the model for improved defense against MIA;
Studying the effectiveness of the recommended defense methods: We demonstrate the models implemented with the above mentioned custom regularizers, reduce MIA accuracy by a higher rate and increase the model’s performance compared to the models without any regularizer and with the L1 or L2 regularizer.
Rest of the paper is structured as follows. In Section 2 we provide the background and in Section 3 we discuss related works. In Section 4 we describe the methodology used in this research and the experimental setup. The results obtained from the experimental study are discussed in the following section. Further, based on the obtained results, we present our new defence methods to improve ML models’ resilience to MIA in Section 6. Finally, we conclude with the remarks on future works in Section 7.
2.1 Machine Learning Preliminaries
In case of the supervised learning, an ML model is trained on a set of data points to capture their inherent features and map these features to a set of predefined output labels. The aim of the training is to create a model is capable of predicting label for a new unlabeled data point.
Assume, is the set of
data points sampled from a probability distribution
of the feature vectors, where is the feature space, and class label , where is a predefined set of class labels. An ML algorithm attempts to identify a function that maps the input data points to different classes. The output is often a probability vector that indicates the relative association of a data point to each class label in . Consider model as:
where represents the model parameters. The target is to find a function that minimizes the below expected loss of prediction:
The empirical loss of the model over the training dataset can be defined as:
where is the number of data points in .
An overfitted model that captures the exact feature-to-label mapping, is more likely to produce incorrect prediction while encountering new data points. In order to prevent the model to overfit towards a particular dataset and to achieve a better generalization for all the data points sampled from the similar distributions, different regularizers are often used in practice. A regularizer penalizes model’s complexity. Therefore, the optimization problem of the model with parameters is to minimize the empirical loss:
where is the regularizing function with a weight . A popular method for training the model is to repeatedly update the model parameters in the direction of the gradients to achieve the lowest possible empirical loss.
2.2 Membership Inference Attack (MIA)
In this work we use MIA as first proposed in [10shokri2017membership] to evaluate the information leakage in ML models that aims to learn a record’s presence in the training dataset of an ML model without knowing the structure of the model. The attack is based on several assumptions. Firstly, the attacker has a black-box oracle access to the model and can acquire the model’s prediction vector on any data record. Secondly, the data distribution of the target model’s inputs and outputs, including their number and the range of values they can take, is known to the adversary. The adversary intends to distinguish training set members from non-members by observing the model’s predictions.
The attack consists of training three different models: 1) target model, 2) shadow model and 3) attack model. The target model is the model of interest to the adversary to learn sensitive information about the individuals. The structure of this model and the used training dataset are essentially kept hidden. An adversary has access to the target model to perform queries on it and obtain some aggregated statistics on the data. The purpose of a shadow model is to imitate the target model’s behaviour and produce outputs similar to it. As the target model’s structure is not known, the adversary implements multiple shadow models by sampling data from similar distributions as the target model’s training data. Finally, the attack model is a classifier that categorizes records into member and non-member classes (i.e used in the target model’s training data or not respectively). The attack model is trained on the prediction vector obtained from the shadow models and tested against the prediction vector of the target model (truth data/ground truth). Algorithm1 illustrates the steps of implementing MIA.
3 Related Works
Membership Inference Attack (MIA). Adversarial attacks that can gain insight about the records or the model without knowing the model structure (black-box attack), are allegedly more devastating [dwork2017exposed, tramer2016stealing]. Membership Inference Attack (MIA) [10shokri2017membership, 18truex2018towards] is one such attack, where the adversary successfully manages to learn whether a record is part of the private training data, even if the model’s structure is not disclosed to the adversary. Since its inception, MIA has shown a tremendous success on models such as Artificial Neural Network (ANN) [10shokri2017membership, salem2018ml], Generative Adversarial Network (GAN) [17hayes2019logan, chen2019gan, hilprecht2019monte] and differentially private models [12rahman2018membership] in a range of domains such as social media [liu2019socinf], health [backes2016membership], sequence-to-sequence video captioning [hisamoto2019membership] and user mobility [pyrgelis2017knock]. To attack a model by simulating its behaviour, attacker deploys multiple shadow models [10shokri2017membership]. However, later, Salem et.al. [salem2018ml] showed that an attack model that uses only one such shadow model or no shadow model at all can still render strong membership inference.
Defenses against MIA. A successful MIA exploits a model’s tendency to yield higher confidence value when encountering data that they are trained on (members) than the others. The property of a model to overfit towards its training data makes it vulnerable to MIA [yeom2018privacy]. Thus, existing defences against MIA attempt to reduce overfitting of a model by applying regularizers like L2-regularizer [10shokri2017membership] that generalize a model’s prediction or by adding Dropout layers to the model [19srivastava2014dropout]
that ignore a few neurons in each iteration of training to avoid high train accuracy. Nasr et al.[nasr2018machine] proposed a min-max game-theoretic defence method using the highest possible attack accuracy as adversarial regularization to decrease the target model’s prediction loss while increasing privacy against MIA. Differential privacy [21mcsherry2009differentially] is also used in another proposed defence method against membership inference. However, the privacy guarantee of the differentially private method is limited to a certain value of the privacy budget [12rahman2018membership, leino2019stolen]. Furthermore, recent works reveal that overfitting can be necessary but not a sufficient condition for a successful MIA [long2018understanding]. MIA is shown to be strong even when the models are well-generalized. Therefore, it is necessary to identify other reasons for a model’s vulnerability to MIA.
Fairness and MIA-indistinguishability. Fairness in ML [verma2018fairness, gajane2017formalizing, barocas2017fairness] is an emerging concept, that determines how much a model distorts from producing predictions with equal probabilities for individuals across different protected groups. Similar to the ML fairness, [yaghini2019disparate]
defines MIA-indistinguishability as a model’s vulnerability towards MIA by estimating the model’s discrimination in prediction for the member and non-member records. Intuitively, besides overfitting, both MIA-indistinguishability and model’s fairness in general can be possible reasons behind MIA’s success.
The overall research is based on an exploratory research method that consists of two major stages. In the first stage, MIA is implemented on customized datasets and multiple models to systematically evaluate its performance and determine the correlation of different data and model properties with the success of the attack. In the second stage, we study the effectiveness of using multiple model-based properties to improve the model’s resilience against MIA by applying them as regularizers and compare with model without any regularizer and with the two standard and -norm regularizers.
4.1 Explored Properties and Their Measures
In an ML setting, a dataset can be considered as a collection of records or data points with a feature vector and a set of class labels . Each feature contains different level of entropy and balances in the feature values. Also, records of each dataset may have different balance levels between the class labels .
Data properties: As an ML model is data-dependent, data properties such as size and entropy of the dataset, number of selected features and balances in their values and balances in the classes can affect a model’s prediction tremendously. Therefore, these data properties might have an impact on MIA’s success as well. Measures of different data properties studied in this research are described below:
Entropy of the overall dataset can be measured by taking the mean entropy over number of features:
where is the feature value of the data points for the feature . is the entropy of each feature , that could be computed using Shannon’s entropy formula [entropy_2019] as below:
where is the set of possible feature values for a single feature .
Balance in the classes can be measured as the frequency ratio between different classes in the dataset. For instance, balance between classes of a dataset can be measured as:
In case of the multiple classes, we consider the ratio between one selected class () against all the other classes (). To simplify, throughout the paper, the class balances are denoted as the percentage of one of the class labels. For example, 10% class balance refers to the dataset having 10% records labeled as the class .
We can calculate the feature balance as the ratio between one feature value and all the other feature values in , where is the set of possible feature values for a single feature . If we consider multiple features as a single feature by concatenating all the feature values of a record into one, then the set of all possible combinations of feature values can be denoted as: . We measure the feature balance as the ratio between one selected feature value combination against other combinations and denote the balance as the percentage of the chosen combination. So, the balance in the features can be measured as:
We also explore size of the dataset and number of features, measured as the number of records and features in the dataset, respectively, for MIA’s success.
Model properties: Selection of the suitable classifier as the ML model and choice of the different hyperparameters such as the number of the hidden layers, nodes per layer and the learning rates highly contribute to achieving higher output accuracy [takahashi2018framework], which in turn may also affect MIA’s success. Also, in earlier works [nasr2018machine, 10shokri2017membership], a model’s overfitting, measured in terms of the difference between the train and test accuracy of a model, is considered to be the primary contributor to MIA’s success.
In addition to exploring the above properties, we calculate the mutual information between the model parameters and the records, as a measure of the amount of information extracted by the model after observing the features and evaluate it’s impact on MIA. Assume an ML function maps the input data points to different classes by generating the model’s parameters based on all the features of the training set. Thus, the mutual information, can be calculated as:
where is the mutual information between a record and parameters generated by the model. This mutual information can be calculated using below equation:
where is the marginal entropy of one record and is the conditional entropy that quantifies the amount of information needed to explain when the value of are known. We compute the value of by taking the mean value of the parameters produced in multiple layers of the Neural Network for each feature.
We also explore a model’s fairness in prediction which measures the model’s bias in predicting different individuals grouped based on a particular protected feature with the preferred class. Let’s assume, ‘Gender’ is a protected feature in a dataset containing two feature values: ‘male’ and ‘female’. Fairness of an ML model on this dataset would mean whether the model treats both ‘male’ and ‘female’ records equally in terms of prediction without giving benefit to one group over the other. From the many different definitions of ML fairness in the existing literature [gajane2017formalizing, verma2018fairness, binns2017fairness, barocas2017fairness], we consider three fairnesses in this study: group or statistical fairness, predictive fairness and individual fairness.
Group Fairness: If a model predicts a particular outcome for the individuals across the protected subgroups with almost equal probabilities [gajane2017formalizing], the model is considered to have group fairness. A predictor achieves group fairness with respect to the two groups of records iff
where and are the predicted outcomes for the records in groups and respectively and is the preferred class label that needs to be predicted fairly for all groups of records.
To determine the group fairness of a model, , we estimate the probabilistic difference of prediction between two subgroups of the records for number of class labels as below:
Predictive Fairness: A classifier has a predictive fairness if the subgroups of the records that truly belong to the preferred class have equal probabilities to be predicted in that class [verma2018fairness]. That is,
where is the preferred class label. Based on Equation (9), we measure the predictive fairness for n class labels as below:
Individual Fairness: The concept of individual fairness was introduced in [dwork2012fairness] that ascertains a predictor is fair if it produces similar outputs for similar individuals. That is, if two records, and are similar, then the prediction on them would be similar by the model [gajane2017formalizing]:
where, is the distance between the records and and is the distance between the model’s prediction for and denoted as and , respectively. In our experiments, to measure the individual fairness , we consider the differences between two distances and derived from Equation (11):
Both and can be measured in different ways. In [dwork2012fairness], a statistical distance metric is proposed for that measures the total variation norm between two probabilities for the outcome of the classifier for the two considered records:
This metric assumes that the distance metric selected for will scale the measured distance within to range. They also suggest a better choice for using the relative norm metric:
which would allow the use of a metric for that considers two records to be similar if and dissimilar if . The authors also discussed a few futuristic insights to identifying a proper metric for (refer to [dwork2012fairness] for details).
For simplicity of the calculation, we use the statistical distances between two sets of records and between the predictions on them (Equation (13)) to compute and , respectively, as below:
In addition to the model’s fairness, we also explore MIA-indistinguishablity [yaghini2019disparate]. A model can be said MIA-indistinguishable if the probability of a record’s presence in the training dataset and in the test dataset are same. The target model satisfies perfect MIA-indistinguishability if for any prediction :
where is the membership value denoting whether the record is a member of the train data () or test data ().
How much a model deviates from being indistinguishable can be measured using the -relative metric between the considered probabilities and maximum divergence across the classes of [yaghini2019disparate]:
Thus, a model is presumably less vulnerable to MIA if is close to zero. We evaluate MIA-indistinguishability for different member and non-member ratios. To simplify, we denote the ratios as the member rates, representing the percentage of training records (member records) sampled from the target dataset.
4.2 Experimental Setup
We apply and evaluate MIA as outlined in Algorithm 1 in multiple experimental setup by varying the properties as described in Section 4.1 In this section we explain how the data are customized along with the model setup for different experiments.
|Hidden layers||1-layer to 5-layers|
|Number of nodes||5, 50, 100, 500|
|Learning rates,||0.00001, 0.0001, 0.001, 0.01, 0.1|
|L2-ratios,||0, 0.001, 0.01, 0.1, 1, 2, 3|
|Logistic Regression (LR)||=0.01, solver= LBGFS|
|Support Vectopr Machine (SVM)||=0.01, kernel= RBF|
|Random Forest (RF)||n-estimators=100,|
K-Nearest Neighbour (KNN)
|Artificial Neural Network (ANN)||= 0.001, solver= sgd,|
We select three extensively used datasets in studying MIA [10shokri2017membership, attriguard, schonherr2018adversarial] and create multiple modified versions of them to perform different experiments. The datasets are pre-processed as follows:
UCI Adult dataset [uci_adult]: This dataset contains individuals’ records classified into two groups based on whether a person makes over $50k per year. Total number of records is with census features such as age, gender, education, marital status, occupation and working hours. In different experiments we use randomly sampled records from this dataset.
Purchase dataset [kaggle]: To prepare the dataset, we join two datasets from [kaggle] containing the customers’ purchasing records (transactions) and offered incentives to them (history). We obtain 16 features such as chain, category, purchase quantity, purchase amount, offer, market and repeater by joining the “transactions” and the “history” datasets. We prepare the primary dataset by randomly sampling 400,000 records. Later in different experiments we use 10,000-100,000 records randomly sampled from the primary dataset. We label the records into 2 classes
using K-means clustering to measure the balance between the classes during different experiments.
Texas hospital dataset[texas]: This dataset is prepared based on publicly available Texas Hospital Discharge Data with information on inpatient stays in several health facilities, released by the Texas Department of State Health Services. We use records randomly sampled from years 2006 to 2009 and 16 features such as patient’s gender, country, race, principal surgical procedure code & day, risk mortality and illness severity. We label the records into two classes denoting whether a patient got immediate response by calculating the difference between the date of admission and the date of principal surgery. In different experiments we use randomly sampled records from the primary dataset.
From the pre-processed datasets we use to records with interval for Purchase and Texas datasets and to records with interval for Adult dataset in the target model and similar number of distinct records in the shadow model to evaluate the effect of the data sizes on MIA. To evaluate the impact of the balance in the classes on MIA we sample records keeping the balance percentage of the class label ‘1’ from to with interval. For the experiments on feature balances, we create the datasets with only 5 selected features and sample records by keeping their feature balances between and with interval. For analysing the effect of the number of features on MIA, we generate multiple datasets having to features (, in case of Adult dataset). To understand the impact of the model fairness properly, we consider datasets with 5 features having both the classes and the features balanced from to with interval. For all the experiments we always use 75% member rate while splitting the target dataset into train and test records except for the experiment on MIA-indistinguishability, where we measure MIA-indistinguishability for the different member rates from 10% to 90%, with a 10% interval. We use datasets with records ( in case of Adult dataset) with all the features and with two levels of balances ( and ) in the classes in all the other experiments.
|Balance in the classes||-0.940||-0.181||-0.015|
|No of features||-0.146||-0.181||-0.187|
|Balance in the features||0.011||0.045||0.009|
The default ANN model used as the target, shadow and attack models for every experiment is structured as a one-hidden layer network and -nodes with , solver=sgd and epochs=. Furthermore, to reduce the computational complexity, we use only one shadow model following the work of Salem et.al. in [salem2018ml], as their experiments show that similar attack accuracy was observed when using one shadow model instead of multiple shadow models as in [10shokri2017membership]. For investigating the effect of different hyperparamters, we use multiple hyperparameter combinations as given in Table 1
. In addition, five models are chosen to study the effect of using different classifiers as shadow models on the attack accuracy. The models are Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), K- Nearest Neighbour (KNN) and ANN. Choice of the hyperparameters for all the models are listed in Table2. In this experiment, target and shadow model combinations are tested in two settings: one-to-one and one-versus-all. In one-to-one setting, the models are examined against each other using only one shadow model. On the other hand, in the one-versus-all setting, each model is tested against five shadow models each structured as one of the considered models.
We use the classification accuracy, precision and recall scores of the attack model to evaluate MIA’s performance. We also compute Pearson’s correlation coefficients[pearsons_2019] for each of the properties against attack accuracy for better understanding of how they correlate to each other. A positive value of the coefficient suggests that increase in the property value boosts the attack accuracy, while the negative value refers to the opposite. The higher the value for correlation coefficient is, the higher the correlation between the evaluated property and MIA accuracy.
5 Exploratory Analysis of the Impact of Different Properties on MIA
This section illustrates the experimental results on the performance of MIA in correspondence to the investigated data and model properties. The acquired results reveal that, most of the different properties affect MIA’s performance. The Pearson’s correlation coefficients computed between the properties and the attack accuracy also support our findings (Table 3). Among all the properties, data size, balance in the classes, group fairness and mutual information between the records and the model-generated parameters show a strong correlation with MIA.
5.1 Effect of Different Data Properties
Data Sizes: From the experimental results obtained on all three datasets (Figure 1), it is evident that the adversary has more advantage in attacking rather large datasets. On the same model structure, regardless of the highly balanced or imbalanced class labels, an increase in the data size enhances the attack accuracy. Table 3 also indicates a strong positive correlation between data sizes and attack accuracy for all three datasets. The attack accuracy and recall are higher for the datasets with imbalanced class labels than the balanced ones, while attack precision is lower for the datasets with imbalanced class labels. That means while more True Positives(TP) are predicted more False Positives (FP) are also included in the prediction when classes are imbalanced. However, TPs have more cost than FPs in this problem setting. We obtain a similar trend of attack precision and attack recall in most of the experiments.
Balance in the Classes: Figure 2 shows the attack accuracy for different balances in the class labels for all three of the datasets. When a dataset has a proper balance in the binary class labels (i.e., ), the MIA is less successful. In terms of an ML model’s prediction, this behaviour explains that the model produces less biased prediction towards one particular class when it is trained on a dataset that has properly balanced classes giving less benefit to the adversary. From Table 3, we can also observe a strong negative correlation between the balance in the classes and the attack accuracy. A similar trend with attack accuracy, precision and recall are achieved as described above for data size results. In addition to this, from the figure, it can be seen that, although the attack accuracy is lower for class balance than class balance, the lowest attack accuracy value is not always achievable from a perfectly balanced class. For example, in case of the Adult dataset, the lowest attack accuracy value is obtained with balance between the class labels.
Balance in the Features: The effect of each feature is different against MIA. Figure 3 shows the performance of the attack on the datasets with five selected features from Purchase, Texas and Adult datasets that shows no consistency in the increase or decrease of the attack accuracy with respect to the increase in the feature balances. However, attack accuracy is lower as expected for the class balance 50% compared to the class balance 10%. Though Table 3 shows a positive correlation between the balance in the features and the attack accuracy, it is hard to make any straightforward conclusion on how the balance in the features affects MIA’s success. The results demonstrate that several features combined by keeping a certain balance among them may prevent membership inference better, which needs further exploration.
Number of Features: Figure 4, represents the observed attack accuracy for the datasets with a gradual increase in the number of features. Although there is no straightforward trend in the attack accuracy, from the figure, it can be understood that a certain combination of features in a dataset shows better defence against MIA. For instance, in case of the Purchase dataset, mean attack accuracy for a 2-features dataset is , while for the 16-features the accuracy decreases to . Further exploration may reveal the actual reason behind this behavior of different feature combinations.
Entropy: Figure 5 shows the results of the impact of entropy on MIA for the datasets with 10% balance in the classes. The small amount of randomness introduced by only a few features renders high attack accuracies. However, from the result, it is hard to derive any conclusion about the relationship between the attack accuracy and entropy except the Adult dataset. In the case of Adult dataset, attack accuracy range is lower for low entropy value. We find a similar trend for the 50% balance in the classes as well. Although the correlation between the entropy and the attack accuracy is negative according to Table 3, the experimental results suggest the necessity of further investigation on this property.
5.2 Effect of Different Model Properties
Selection of Model’s Hyperparameters: Results obtained for the ANN models having 1 to 5-hidden layers are presented in Figure 6 for Purchase dataset. From the figure it can be observed that, having a higher number of nodes on each layer increases the attack accuracy. Also there is a significant increase in the attack accuracy for a slight increase in the learning rate () of the target model. The selection of the L2-ratio () to control the amount of regularization impacts MIA too. More regularization decreases attack accuracy. However, after a certain point (), increasing the regularization value seems to behave the opposite. Similar results are obtained on other datasets as well.
Target-Shadow Model Combination: From the results (Table 4) we observe that, each model shows different level of vulnerability against other shadow models. It is also shown that, shadow models built as similar to the target do not guarantee maximum attack accuracy. However, when a target model encounters all five shadow models, the attack accuracy surges higher in case of all the target models. From the experimental results it can be realized that, the success of MIA is highly classifier dependent and combined attack of multiple shadow models against one target model is more severe.
Mutual Information between records and Model Parameters: Mutual information between model-generated parameters and the records can capture the information learned by the model over a dataset. Higher mutual information between them indicates that the model captures more information from the features, which in turn could result in higher vulnerability towards membership inference. The resulting figure (Figure 8) of the experiments and the correlation coefficient (Table 3) also support this interpretation. For all the datasets, attack accuracy values soar up for a small increase in the values of .
MIA-indistinguishability: This property is measured as the difference between the member and non-member prediction probabilities. Hence, a lower value of MIA-indistinguishability suggests a model with less vulnerability. Figure 7 illustrates results obtained from this experiment. We observe that, when the member rate is 50%, the range of MIA-indistinguishability is very low and attack accuracy is always close to the random guess (50%). For both the cases where the member and non-member ratio is unequal (member rates 50% and 50%), increase in the MIA-indistinguishability boosts the attack accuracy, except for the Texas dataset. This results require a further assessment in order to verify the relationship between MIA and MIA-indistinguishability. The correlation coefficients between MIA-indistinguishability and attack accuracy indicate a negative correlation as expected (Table 3).
Model’s Overfitting: In order to understand the contribution of the overfitting to MIA’s success, we consider multiple hyperparameter settings that depict very low overfitting in terms of the difference between train and test accuracy. For instance, comparison between the test accuracy, train accuracy and attack accuracy values from the ANN models consisting of 1 and 5-hidden layers in the cases of both with and without L2 regularizer is illustrated in Table 5. It is evident from the results that even though the models are not overfitted, increasing the number of hidden layers shows an increase in the attack accuracy. Also, model with regularizer expectedly yields lower attack accuracy compared to the model with the same structure but no regularization.
Model’s Fairness: Table 3 shows a strong correlation between the fairness and attack accuracy. As we measure a model’s fairness as the difference in the model’s predictions for two groups of records, lower value represents a fairer model. Figure 9 depicts the results of our studies on three different fairnesses- group, predictive and individual fairness for multiple balances in the classes and features for the Purchase dataset. The figure shows that, for a model with a higher fairness (lower difference in prediction probabilities), the attack is comparatively weaker. Also, there is a significant positive relationship between the balance in the features and the classes with model’s fairness. A model depicts higher fairness for the datasets with properly balanced features and classes.
Although using smaller datasets limit the fruitfulness of the training, the adversarial advantage over the huge data should be estimated before deploying them in MLaaS platform. In addition to that, sampling records by keeping a proper balance in the class labels would also reduce the impact on MIA and may foster fairness in the model. Besides the data size and the class balances, features used in the dataset may also be tested against MIA in several combinations with different balances among them to identify an optimal set of features that may expose less information. In the case of model properties, hyperparameter selection, mutual information between the records and model’s parameters and model’s fairness have more impact on MIA. However, in reality, it would be extremely difficult to control these properties to minimize information leakage, especially the data properties. Rather, we attempt to elevate the model’s resilience against MIA according to the above observations. The next section describes our method and findings on the experiments of the MIA resilience in detail.
6 Towards MIA-resilient ML models
In this section, we propose a new technique by introducing custom regularizer in the model that aims at strengthening the model against MIA with a successful reduction in the attack accuracy. To validate our proposed technique, we implement multiple model properties as regularizers in the model and observe the impact on both MIA and model’s prediction. The results demonstrate that, these custom regularizers reduce MIA accruacy and at the same time, improve model’s predictability by inducing better generalization of the parameters.
6.1 Tuning the Model for Better Resilience
Compared to the overfitting, model properties such as fairness and the mutual information between the records and model parameters show greater influence on MIA accuracy. As both of these properties are negatively correlated to the attack accuracy, an attempt to minimize them would essentially improve the model’s resilience. In order to explore this, we implement a novel technique to improve the underlying model’s performance as well as the resilience against MIA by using property values as regularizers in the model. We study four model properties: the group (), predictive () and individual fairness () and the mutual information between the records and the model parameters ().
Algorithm 2 gives an outline of how a model’s group fairness can be used as a regularizer in that model. The algorithm starts by sampling records from the two groups of individuals and obtaining the group fairness according to Equation (8) by training the target model . After repeating the steps for times (steps 2-4 in Algorithm 2), the maximum group fairness of the model is estimated (step 5). Finally the below regularizer function is used to update the gradients (step 6):
where is the number of parameters () generated by the model in one epoch.
To evaluate the model’s performance, we repeat the steps for 50 epochs and measure the train loss and the group fairness in each epoch. We have also estimated the train and test accuracy of the model and attack accuracy after performing MIA on the model. We follow a similar algorithm for implementing models with other properties as regularizers. We use datasets with 100,000 records (10,000 records for Adult dataset), 5 features and 10% balance in the classes in these experiments.
6.2 Experimental Evaluation of the Proposed Regularizers
For all the four studied regularizers, the obtained results show success both in terms of improving the model’s performance and reducing MIA accuracy. The attack accuracy values as reported in Table 6 show that the lowest attack accuracy was achieved by using one of the proposed regularizers compared to the model without any regularizer or using L1 and L2 regularizers. As expected, all the regularizers perform significantly better in terms of reducing attack accuracy compared to using no regularizer. The train and test accuracy are also improved substantially (Table 6). Also, Figure 10 exhibits the decrease in the model’s loss in each epoch where, for all the datasets, group and predictive fairness achieve the minimum loss of prediction compared to the other regularizers. Furthermore, in terms of reducing unfairness of the model, from Figure 11 it is evident that models with the group, predictive and individual fairness regularizers ensure a higher level of group fairness compared to L1 and L2 regularizers.
7 Conclusion & Future Works
Complete prevention against the inevitable disclosure of information from an ML model through MIA is not achievable yet. But, an optimal model that leaks minimum information can be ensured by carefully adopting both the model and data properties. In this paper, several selective data and model properties are analysed against MIA, to monitor how the attack accuracy is influenced by different property values. Our investigation shows that larger datasets with unbalanced classes are the most vulnerable towards MIA. On the other hand, choosing the right model with proper hyperparameter setting can reduce the vulnerability. ML model that learns too much about the training data and produces higher mutual information between the records and the model’s parameters increases the attack accuracy. Also, we find a fairer model to shows better resilience against MIA. However, few of the explored properties such as balance in the feature values, entropy and MIA-indistinguishability need further exploration to understand the obtained experimental results and realize their contribution to MIA’s success.
We further study how the observations can be utilised to strengthen a model, before allowing public access to it. We apply model properties such as group, predictive and individual fairnesses and mutual information between the records and the parameters as regularizers in the model. All the four regularizers deemed to reduce the attack accuracy as well as model’s training loss according to our experimental results. The results also demonstrate improved group fairness of the model compared to the model without any regularizer and with other standard regularizations (L1, L2- norm).
In this research, we only consider MIA as the adversarial attack on ML model, while other variants of black-box attacks such as model inversion attack [veale2018algorithms] and attribute inference attacks [attriguard] are yet to be studied. In addition, it is necessary to study the information leakage risks in other variants of ML algorithms, such as re-inforced ML and agent-based modelling. Besides, as the defences are model-specific, further research needs to be conducted towards formulating a comprehensive defence against MIA that is not bounded by the type of the target model. Hence, the futuristic notion of this research could be described as a three-folded study:
Further exploration of other data and model properties and their impact on different black-box and white-box adversarial attacks;
Evaluating the effectiveness of adversarial attacks on the range of ML techniques for supervised and unsupervised learning in the case of both centralised and federated settings;
Investigation on the practicality of the existing defences for the above-mentioned scenarios in order to develop an optimal attack-resistant ML model and preferably model-independent defence mechanism.