1 Introduction
Cyberspace users cannot easily avoid the possibility of their identity being incorporated in data that exposes various aspects of their lives [1, 2]. Our day to day life activities are constantly tracked by smart devices, and the unavoidable exposure of personally identifiable information (PII) such as fingerprint, facial features can lead to massive privacy loss. The heavy use of PII in social networks, in the healthcare industry, and by insurance companies, in smart grids makes privacy protection of PII extremely complex. Literature shows more than a few methods to address the growing concerns related to user privacy. Among these methods, disclosure control of microdata has become widely popular in the domain of data mining [1, 3]; it works by applying different privacypreserving mechanisms to the data before releasing them for analysis. Privacypreserving data mining (PPDM) applies disclosure control to data mining in order to preserve privacy while generating knowledge [1].
The main approaches to PPDM are data perturbation (modification) and encryption; literature shows a plethora of privacy preservation approaches under these two categories [4]. There has been more interest in data perturbation due to its lower complexity compared to encryption. Additive perturbation, random rotation, geometric perturbation, randomized response, random projection, microaggregation, hybrid perturbation, data condensation, data wrapping, data rounding, and data swapping are some examples of basic data perturbation algorithms, which show different behavior on different applications and datasets [5, 6, 7, 8, 9, 10, 11]. We can also find a number of hybrid approaches that combine basic perturbation approaches.
As shown in Figure 1, the availability of many privacy preservation approaches has its drawback: the selection of the optimal perturbation algorithm for a particular problem can be quite complex. The figure shows different constraints that need to be considered in choosing the best possible privacy preservation algorithm for a particular application and dataset. The different characteristics of privacy models (e.g. kanonymity, ldiversity, tcloseness, differential privacy ([4]
)), the different properties of privacy preservation algorithms (e.g. geometric perturbation, data condensation, randomized response), the different dynamics of the input data (e.g. the statistical properties, the dimensions), and the different types of applications at hand (e.g. data clustering, deep learning), are examples for the attributes that influence the effectiveness of privacy preservation and the usability of the results. At the same time, this diversity enables the selection of the privacy preservation algorithm that best suits a particular application. There is no generic approach to identify the exact levels of privacy loss vs. utility loss, given a list of privacy preservation algorithms on particular applications and datasets.
Furthermore, many privacy preservation approaches fall out of favour because their applicability is not properly identified. We introduce a new approach named “Privacy Preservation as a Service ” (PPaaS) that employs a novel strategy to apply customized perturbation based on the requirements of the problem at hand and the characteristics of the input dataset.
PPaaS presents a unified service that understands data requesters’ needs and data owners’ (who have full access privileges to the raw input databases which are represented by the lowest layer Figure 2) requirements; it can facilitate privacypreserving data sharing and can identify the best privacy preservation approach. An appropriate set of performance and security metrics describes the quality of such a service, which is used to tailor the best privacy preservation to stakeholders’ needs. The proposed framework collects efficient privacy preservation methods into a pool and applies the approach that best suits both data owner and data requester to the data before making the data available.
1.1 Rationale and technical novelty
Developing generic privacypreserving methods for data mining and statistics purposes is challenging due to the large number of constraints that need to be considered. As the complexity of the applications increases, generic approaches often end up with low utility or low privacy ([12]). Many researchers try to overcome this by focusing on a distinct objective (e.g privacy in deep learning) ([13, 14, 12]). As a result, there are a number of algorithms for some areas such as deep learning, with many viable privacy preservation solutions ([15]). The algorithms having unique features and characteristics, choosing the best one for a particular case can be highly complex.
PPaaS reduces the burden of choosing the optimal privacypreserving algorithm and providing the best protection for the application and dataset at hand by introducing a unified service for the purpose. Since there can be more than one method appropriate for a particular application and dataset, empirical evaluation is utilised in this process. PPaaS manages a pool of algorithms suitable for particular applications. When a certain application/dataset is presented, PPaaS assesses the privacypreserving algorithms and produces a unified metric named fuzzy index (FI) derived from a fuzzy model (which can be used to model the vagueness and impreciseness of information in a realworld problem using fuzzy sets.). We use quantitative definitions of utility and privacy as inputs to the Fuzzy model. The higher the fuzzy index, the better the balance between privacy and utility under the given circumstances. The release of a particular output depends on a configurable threshold value of the corresponding FI. If the required threshold is not reached, the application of the corresponding pool is assessed until one of the privacy preservation algorithms in the pool generates a satisfactory FI ( threshold ) for an application and dataset. With this approach, users are guaranteed to be given the best possible privacy preservation while providing an optimal utility.
2 Literature
Data privacy focuses on impeding the estimation of the original data from the sanitized data, while utility concentrates on preserving applicationspecific properties and information (
[16]). It has been noted that privacy preservation mechanisms decrease utility in general, i.e. they reduce utility to improve privacy, and finding a tradeoff between privacy protection and data utility is an important issue ([17]). In fact, privacy and utility are often conflicting requirements: privacypreserving algorithms provide privacy at the expense of utility. Privacy is often preserved by modifying or perturbing the original data, and a common way of measuring the utility of a privacypreserving method is to investigate perturbation biases ([18]). This bias is the difference between the result of a query on the perturbed data and the result of the same query on the original data. Wilson et al. examined different data perturbation methods and identified Type A, B, C, and D biases, along with an additional bias named Data Mining (DM) bias ([18]). Type A bias occurs when the perturbation of a given attribute causes summary measures to change. Type B bias is the result of the perturbation changing the relationships between confidential attributes, while in case of Type C bias, the relationship between confidential and nonconfidential attributes changes. Type D bias means that the underlying distribution of the data was affected by the sanitization process. If Type DM bias exists, data mining tools will perform less accurately on the perturbed data than they would on the original dataset.An investigation of existing privacy preservation approaches also suggests that they often suffer from utility or privacy issues when they are considered for generic applications ([4]). Methods such as additive perturbation with noise can produce low utility due to the highly randomized nature of added noise ([19, 8]). Randomized response, another privacy preservation approach, has the same issue and produces low utility data due to high randomization ([9]). Methods such as multivariate microaggregation provide low usability due to the complexity introduced by its NPhard nature ([5]). Data condensation provides an efficient solution to privacy preservation of data streams; however, the quality of data degrades as the data grows, eventually leading to low utility ([20]). Many of the multidimensional approaches, such as rotation perturbation and geometric perturbation, introduce high computational complexity and take unacceptably long time to execute ([21, 22]
). This means that such methods in their default settings are not feasible for high dimensional data such as big data and data streams. A structured approach is needed, which can provide a practically applicable solution for selecting the best privacy preservation approach for a given application or dataset.
Several works have looked at the connection between privacy, utility, and usability. Bertino et al. proposed a framework for evaluating privacypreserving data mining algorithms; for each algorithm, they focused on assessing the quality of the sanitized data ([20]). Other frameworks aim at providing environments for dealing with sensitive data. Sharemind is a shared multiparty computation environment allowing secret datasharing ([23]). FRAPP is a matrixtheoretic framework aimed at helping the design of privacypreserving random perturbation schemes ([24]). Thuraisingham et al. went one step further; they provide a vision for designing a framework that measures both the privacy and utility of multiple privacypreserving techniques. They also provide insight into balancing privacy and utility in order to provide better privacy preservation ([25]). However, these frameworks neither provide a solution to the problem of dealing with numerous privacy preservation algorithms and nor provide proper quantification of their utility and privacy against a particular application and dataset at hand.
3 Privacy Preservation as a Service
We propose a novel approach named “Privacy Preservation as a Service (PPaaS)”, a generic framework that can be used to sanitize big data in a granular and applicationspecific manner. In this section, we give a detailed outline of the concept. The high diversity and specificity of privacy preservation methods presents complexities, such as finding a tradeoff between security, utility, and usability. As noted previously, privacy preservation algorithms can suffer from different types of biases. For example, a particular sanitization algorithm used for privacypreserving classification may not have DM bias, but it may suffer from Type B and D biases, while another one has only Type B bias, and a third one has DM bias. Different applications may tolerate different types of bias, and there is no general rule. This means that different privacy preservation algorithms are suitable for different data owner requirements (privacy and performance) and different data requester needs (utility and usability).
A unified service of data sanitization for big data can provide an interactive solution for this problem. PPaaS can choose the most suitable privacy preservation algorithm for a particular analysis at hand. The architecture of PPaaS is presented in Figure 2. It is implemented as a webbased framework that can operate in a web service cluster. The scalability necessary for big data processing is achieved using APIs such as Spark/PySpark ([26]) (as the primary language was Python) with a clean build design adapted with a ModelViewController (MVC) web framework. As the figure shows, the framework consists of three distinct components: (1) the raw datasets/databases, (2) PPaaS privacy preservation module, and (3) the users (e.g. analysts), who work with the sanitized (perturbed) data.
The privacy preservation module consists of pools of application logic (e.g. classification and association mining), and pools of privacy preservation algorithms (e.g. matrix multiplication, additive perturbation). The PPaaS privacy preservation module integrates a collection of privacy preservation algorithms into a collection of pools where each pool represents a particular class of data mining/analysis algorithms. The enlargement of the red circle in Figure 2 shows a possible collection of subpools of privacy preservation algorithms for classification. For instance, rotation perturbation (RP) ([27]) can be integrated into the ”Generic” subpool of pool1: Classification (refer to red circle in Figure 2), as it provides better accuracy towards a collection of classification algorithms. A particular pool may have several subdivisions to enable the synthesis of new data sanitization methods that are tailored to more specific requirements. The database management layer provides the necessary services for uniform data formatting. It also represents a common platform for the application of different privacy preservation algorithms (In the proposed concept, privacy preservation is discussed in terms of data perturbation. The following sections use ”privacy preservation” and ”perturbation” interchangeably, referring to the same objective). The blue arrows in Figure 2 show the data flow from data owners through the database management layer to the sanitization algorithm.
A data owner/curator can utilize the framework to impose privacy on a particular dataset for a particular application by using the best privacy preservation approach from a pool of available algorithms. In the proposed setting, PPaaS requires a trusted curator to identify the query or the analysis requests for a given dataset, and run the PPaaS logic for the corresponding application (e.g. deep learning ([28])). The curator/data owner accesses the data and applies privacy preservation (perturbation) to the data or dataset according to the users’ requirements.
The proposed framework has three key aspects: (1) understanding the data owner/producer requirements (privacy), (2) understanding the data requester/consumer needs (utility), and (3) selecting and applying the optimum privacypreserving algorithm to the data. Finally, the progress of applying privacy preservation to a particular dataset is assessed using a fuzzy metric (named the fuzzy index or FI), which is a single metric to evaluate the balance between privacy and utility provided by the corresponding privacy preservation algorithm. Fig 3 shows the main flow of PPaaS in releasing a perturbed dataset with a customized application of privacypreservation. The data curator will receive a request for a certain operation on the underlying dataset. For example, this request can be for deep learning on a medical dataset that is maintained by the corresponding data owner. The data owner forwards the request to the PPaaS framework, which will select the corresponding pool/subpool of privacy preservation algorithms allocated under deep learning. In the example, this pool may include the following algorithms: local differentially private approaches, geometric data perturbation approaches, random projectionbased data perturbation approaches, which are suitable for producing high utility for deep learning. Next, PPaaS sequentially applies the corresponding pool of privacy preservation algorithms and generates a fuzzy index for each perturbation algorithm. If a particular pool has four privacy preservation algorithms, PPaaS will produce 4 FI values. Next, the PPaaS will select the perturbed dataset with the highest FI, because the corresponding dataset provides the best balance between privacy and utility. The data curator is able to handle different data sources and sanitize them for requests based on the specific needs of a particular requester.
PPaaS uses the fuzzy inference system (FIS) to generate the fuzzy index. Privacy and utility are the only inputs to the FIS that generates a final score that is, the fuzzy index ().
is a quantitative rank that rates the complete process of privacy preservation upon a particular dataset for a given application. A heuristic approach was followed in defining the fuzzy rules which focused on the characteristics of maintaining a balance between privacy and utility. The universe of discourse of the inputs and output ranges from 0 to 1. A higher FI value suggests that the final dataset has high privacy and utility with a good balance between them. The PPaaS dispatcher investigates the value of
corresponding to a particular process of sanitization, compares it with a userdefined balance guarantee, that is taken as an input parameter from the data owner. If , the dataset will be released to the data requester, where is the maximum generated by the pool. Otherwise, the PPaaS will reapply the random perturbation algorithm to find a better solution that satisfies requirement.

(1) 
A fuzzy inference system (FIS) takes several inputs and generates a certain output based on evaluating a collection of specified rules, which are named as fuzzy rules. In the proposed framework (PPaaS), we define a FIS to take the two inputs: privacy and utility to produce an output named fuzzy index (). provides an impression of the quality of the balance between privacy and utility generated after perturbing a dataset using a privacy preservation algorithm. According to the domain knowledge, we already know that a good privacy preservation algorithm should enforce high privacy while producing good utility (e.g. accuracy). Following this notion, should ideally provide high values only when both privacy and utility are high. In case one is high and the other is low, the should be a lower value. Hence, the fuzzy model should produce a rulesurface as presented in Figure 5. Considering all these dynamics between privacy, utility, and , we introduced three membership functions (LOW, MEDIUM, HIGH) for each variable. Next, we considered Gaussian functions for all the membership functions in the two input variables and output variables, as shown in Figure 4. Finally, we defined the nine rules given in Equation 1 to obtain the rulesurface depicted in Figure 5.
Figure 5 shows the rule surface of the fuzzy inference system (FIS), which is used to generate . As shown in the figure, FIS generates higher values for when both utility and privacy are high, whereas for lower values of privacy and utility also stays at a lower level. As shown in the figure, the rule surface makes sure that a higher value of one parameter (privacy or utility) does not result in a higher value for . This property guarantees that the proposed PPaaS framework maintains a good balance between privacy and utility.
Privacy Quantification.
During the application of each privacy preservation algorithm, the privacy will be quantified empirically using a multicolumn privacy metric, considering that the input datasets are ndimensional matrices. In the proposed setting, we assume that all the attributes of a particular dataset are equally important, and we make ensure it by applying zscore normalization to the input datasets. Then we calculate the variance (
) of the difference between the perturbed and nonperturbed datasets. The higher the the higher the privacy, as indicates the difficulty of estimating the original data from the perturbed data ([4]). is a wellestablished approach used to measure the level of privacy of perturbed data ([4]). If is a perturbed data series of attribute , the level of privacy of the perturbation method can be measured using , where . can be given by Equation 2.(2) 
Given that there are attributes in a particular dataset; we consider the minimum privacy guarantee to be the minimum variance () across all the attributes in the corresponding dataset. is the level of privacy of the weakest attribute in a perturbed dataset. Equation 3 shows the generation of minimum privacy guarantee (the minimum variance, ) for a particular dataset.
(3) 
Assuming that a particular pool has privacy preservation algorithms, we scaled the values within 0 and 1, by applying Equation 4 to the corresponding pool. The value returned from Equation 4 is considered as the input to the FIS (which accepts inputs of range: ).
(4) 
Utility Quantification.
The accuracy of the results produced by the requested service is evaluated experimentally to generate the empirical utility. If the application being examined is classification, the classification accuracy is generated for all the privacy preservation algorithms in the pool for the corresponding type of data classification. All the accuracy (utility) values are scaled between 0 and 1 as the range of inputs accepted by the FIS is bounded by the window of .
Algorithm for generating FI
Algorithm 1 is used for generating for a particular pool of privacy preservation algorithms.
4 Case Studies and Results
In this section, we demonstrate how PPaaS selects the best perturbation algorithm and a perturbed dataset from a particular pool of algorithms. During the experiments, we consider five classification algorithms: Multilayer perceptron (MLP), knearest neighbor (IBK), Sequential Minimal Optimization (SVM), Naive Bayes, and J48 (
[29]). We use four privacy preservation algorithms: rotation perturbation (RP), geometric perturbation (GP), PABIDOT, and SEAL ([4]), which are benchmarked for utility for the selected classification algorithms ([4]). The algorithms were tested on five different datasets retrieved from the UCI machine learning data repository
^{1}^{1}1http://archive.ics.uci.edu/ml/index.php. Table I provides a summary of the datasets. All the experiments were run on a Windows 7 (Enterprise 64bit, Build 7601) computer with an Intel(R) i74790 (4 generation) CPU (8 cores, 3.60 GHz) and 8GB RAM.Dataset  Abbreviation  Number of Records  Number of Attributes  Number of Classes 

Wholesale customers^{2}^{2}2https://archive.ics.uci.edu/ml/datasets/Wholesale+customers  WCDS  440  8  2 
Wine Quality^{3}^{3}3https://archive.ics.uci.edu/ml/datasets/Wine+Quality  WQDS  4898  12  7 
Page Blocks Classification ^{4}^{4}4https://archive.ics.uci.edu/ml/datasets/Page+Blocks+Classification  PBDS  5473  11  5 
Letter Recognition^{5}^{5}5https://archive.ics.uci.edu/ml/datasets/Letter+Recognition  LRDS  20000  17  26 
Statlog (Shuttle)^{6}^{6}6https://archive.ics.uci.edu/ml/datasets/Statlog+%28Shuttle%29  SSDS  58000  9  7 
In the proposed experimental setting, we consider 25 case studies where each case study considers one of the five classification algorithms and one of the five datasets. We consider a pool of four data perturbation algorithms: RP, GP, PABIDOT, and SEAL under each of the case studies represented as CS (CS stands for ”case study”) in Table II. Next, we evaluated the performance of each privacy preservation algorithm in each case to generate the ranks (Fuzzy Indices: FIs) and recorded them in Table III. Table II shows the classification accuracy and the minimum privacy guarantee produced for each pool of privacy preservation algorithms. In each pool, the input datasets were perturbed using the four privacy preservation algorithms. Then the perturbed data were analysed by each classification algorithm to generate classification accuracy (utility) values. Table II, includes the
values generated as explained before. We considered the standard deviation (
) of the difference between the original normalized data and the perturbed data. To keep the values within the 0 to 1 range for the fuzzy input (for privacy), we applied Equation 4 on the values, where the () is the maximum standard deviation value returned by the corresponding pool of privacy preservation algorithms.Dataset 

Utility after privacy preservation  Privacy guarantee  





min(std(P))  Scaled  
LRDS  RP  0.7404  0.8719  0.7107  0.4841  0.6489  0.8750  0.6223  
GP  0.7912  0.9305  0.7792  0.5989  0.7054  1.3248  0.9422  
PABIDOT  0.7822  0.9224  0.7848  0.6280  0.7262  1.4046  0.9989  
SEAL  0.8059  0.9367  0.8171  0.6310  0.8528  1.4061  1.0000  
PBDS  RP  0.9200  0.9552  0.8999  0.3576  0.9561  0.7261  0.5149  
GP  0.9024  0.9567  0.8993  0.4310  0.9549  0.2845  0.2017  
PABIDOT  0.9583  0.9476  0.9209  0.8968  0.9492  1.4102  1.0000  
SEAL  0.9634  0.9673  0.9559  0.8697  0.9634  1.3900  0.9857  
SSDS  RP  0.9626  0.9980  0.8821  0.6904  0.9951  1.2820  0.8847  
GP  0.9873  0.9981  0.7841  0.7918  0.9959  1.4490  1.0000  
PABIDOT  0.9865  0.9867  0.9280  0.9134  0.9874  1.4058  0.9702  
SEAL  0.9970  0.9921  0.9851  0.8994  0.9987  1.4065  0.9707  
WCDS  RP  0.8909  0.8500  0.8227  0.8455  0.8682  1.0105  0.6912  
GP  0.9182  0.8659  0.8500  0.8432  0.8886  1.4620  1.0000  
PABIDOT  0.9045  0.8545  0.8841  0.8886  0.8841  1.3680  0.9357  
SEAL  0.8932  0.8682  0.8909  0.8841  0.8659  1.3130  0.8981  
WQDS  RP  0.4765  0.5329  0.4488  0.3232  0.4553  1.2014  0.8570  
GP  0.4886  0.5688  0.4488  0.3216  0.4643  1.3463  0.9603  
PABIDOT  0.5412  0.6182  0.5147  0.4657  0.4916  1.4019  1.0000  
SEAL  0.5392  0.6402  0.5202  0.4783  0.8415  1.3834  0.9868 
The values in Tables II are evaluated using the proposed fuzzy model to generate the ranks for each privacy preservation algorithm and perturbed dataset as given in Table II. The highest ranks generated in each pool of algorithms are in bold and highlighted in colour. Although SEAL has the best performance results in many cases, the table clearly shows that the input dataset and the choice of application (e.g. classification) plays a major role in selecting the best privacy preservation approach.
5 Discussion
In this paper, we proposed a new paradigm named privacy preservation as a service (PPaaS), to improve the application of privacy on a dataset or application, eventually improving the utility of existing and new privacy preservation approaches. The domain of data privacy contains a plethora of different privacy preservation approaches that are proposed for different types of applications. Consequently it is a highly complex process to identify the best possible privacy preservation approach for a particular application. PPaaS provides a solution by introducing a serviceoriented framework that collects existing privacy preservation approaches and semantically categorizes them into pools of applications. Developers of new privacy preservation algorithms can introduce their methods to the PPaaS framework and add to the corresponding pools of applications. When a data owner/curator wants to apply privacypreservation to a particular dataset, PPaaS will rank the methods in the relevant pools of applications with respect to the dataset. The ranks are expressed in the form of a Fuzzy Index (). is generated using a fuzzy inference system that takes two inputs, privacy, and utility. PPaaS quantifies privacy in terms of the variance of the difference between the input data and perturbed data (). PPaaS considers the concept of minimum privacy guarantee (), where the minimum of to is considered. is the strength of the weakest attribute in a perturbed dataset, and is called the minimum privacy guarantee. The utility is the accuracy measured under the corresponding application. For example, when the application is data classification, PPaaS considers classification accuracy as the utility measurement. PPaaS will select the privacy preservation approach or the perturbed dataset that returns the highest , which represents the case with the best balance between privacy and utility.
Dataset 

rank values returned under each CS  






LRDS  RP  0.5107  0.5068  0.5091  0.4999  0.5072  
GP  0.6382  0.8156  0.6203  0.5036  0.5391  
PABIDOT  0.6247  0.8093  0.6286  0.5078  0.5560  
SEAL  0.6608  0.8201  0.6782  0.5083  0.7315  
PBDS  RP  0.5001  0.5001  0.5001  0.4891  0.5001  
GP  0.3509  0.3509  0.3509  0.3509  0.3509  
PABIDOT  0.8334  0.8272  0.8081  0.7856  0.8282  
SEAL  0.8360  0.8379  0.8321  0.7541  0.8360  
SSDS  RP  0.7723  0.7723  0.7693  0.5296  0.7723  
GP  0.8462  0.8499  0.6275  0.6391  0.8492  
PABIDOT  0.8393  0.8393  0.8137  0.8016  0.8393  
SEAL  0.8395  0.8395  0.8395  0.7882  0.8395  
WCDS  RP  0.5301  0.5301  0.5301  0.5301  0.5301  
GP  0.8058  0.7492  0.7275  0.7178  0.7767  
PABIDOT  0.7933  0.7339  0.7716  0.7767  0.7716  
SEAL  0.7818  0.7522  0.7793  0.7716  0.7492  
WQDS  RP  0.4998  0.5003  0.4992  0.4773  0.4994  
GP  0.5000  0.5014  0.4993  0.4765  0.4997  
PABIDOT  0.5004  0.5061  0.5001  0.4997  0.5000  
SEAL  0.5004  0.5103  0.5001  0.4999  0.7153 
We ran experiments with PPaaS using five different datasets, five different classification algorithms, and four different privacypreservation algorithms that are benchmarked to produce good utility over the corresponding classification algorithms. Our experiments show that the four privacy preservation algorithms are ranked differently based on the application and the input dataset. The highest values of indicate the highest privacy and utility with the best balance between them. After comparing the values (available in Table III) generated using the values available in Table II, we can conclude that provides high values, if and only if, both utility and privacy returned by the corresponding method is high. In all other cases, fuzzy inference system () produces lower values for the . Hence, enables PPaaS to identify the bestperturbed dataset generated by the most suitable privacy preservation algorithm for the corresponding pool of algorithms and for the input dataset.
6 Conclusion
This paper introduced a novel framework named Privacy Preservation as a Service (PPaaS), which tailors privacy preservation to stakeholders’ needs. PPaaS reduces the complexity of choosing the best data perturbation algorithm from a large number of privacy preservation algorithms. The ability to apply the best perturbation while preserving enough utility makes PPaaS an excellent solution for big data perturbation. In order to select the best privacy preservation method, PPaaS uses a fuzzy inference system (FIS) that enables PPaaS to generate ranks that are expressed as fuzzy indices for the privacy preservation algorithms applied to a dataset for a given application. The experimental results show that the fuzzy indices are a good indication of the capability of a particular privacy preservation algorithm to maintain a good balance between privacy and utility.
References
 [1] M. A. P. Chamikara, P. Bertok, D. Liu, S. Camtepe, and I. Khalil, “Efficient data perturbation for privacy preserving and accurate data stream mining,” Pervasive and Mobile Computing, vol. 48, pp. 1–19, 2018.
 [2] M. A. P. Chamikara, P. Bertok, I. Khalil, D. Liu, S. Camtepe, and M. Atiquzzaman, “A trustworthy privacy preserving framework for machine learning in industrial iot systems,” IEEE Transactions on Industrial Informatics, vol. 16, no. 9, pp. 6092–6102, 2020.

[3]
M. A. P. Chamikara, P. Bertok, I. Khalil, D. Liu, and S. Camtepe, “Privacy preserving face recognition utilizing differential privacy,”
Computers & Security, 2020.  [4] M. A. P. Chamikara, P. Bertok, D. Liu, S. Camtepe, and I. Khalil, “Efficient privacy preservation of big data for accurate data mining,” Information Sciences, 2019.
 [5] V. Torra, “Fuzzy microaggregation for the transparency principle,” Journal of Applied Logic, vol. 23, pp. 70–80, 2017.
 [6] A. Hasan, Q. Jiang, J. Luo, C. Li, and L. Chen, “An effective value swapping method for privacy preserving data publishing,” Security and Communication Networks, vol. 9, no. 16, pp. 3219–3228, 2016.
 [7] Y. A. A. S. Aldeen, M. Salleh, and M. A. Razzaque, “A comprehensive review on privacy preserving data mining,” SpringerPlus, vol. 4, no. 1, p. 694, 2015.
 [8] B. D. Okkalioglu, M. Okkalioglu, M. Koc, and H. Polat, “A survey: deriving private information from perturbed data,” Artificial Intelligence Review, vol. 44, no. 4, pp. 547–569, 2015.
 [9] C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014.
 [10] M. A. P. Chamikara, P. Bertok, I. Khalil, D. Liu, and S. Camtepe, “Privacy preserving distributed machine learning with federated learning,” arXiv preprint arXiv:2004.12108, 2020.
 [11] M. A. P. Chamikara, P. Bertók, D. Liu, S. Camtepe, and I. Khalil, “An efficient and scalable privacy preserving algorithm for big data and data streams,” Computers & Security, vol. 87, p. 101570, 2019.
 [12] M. A. P. Chamikara, P. Bertok, I. Khalil, D. Liu, S. Camtepe, and M. Atiquzzaman, “Local differential privacy for deep learning,” IEEE Internet of Things Journal, 2019.
 [13] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2016, pp. 308–318.
 [14] R. Shokri and V. Shmatikov, “Privacypreserving deep learning,” in Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. ACM, 2015, pp. 1310–1321.
 [15] J. Zhao, Y. Chen, and W. Zhang, “Differential privacy preservation in deep learning: Challenges, opportunities and solutions,” IEEE Access, vol. 7, pp. 48 901–48 911, 2019.
 [16] C. C. Aggarwal, “Privacypreserving data mining,” in Data Mining. Springer, 2015, pp. 663–693.
 [17] L. Xu, C. Jiang, Y. Chen, Y. Ren, and K. R. Liu, “Privacy or utility in data collection? a contract theoretic approach,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 7, pp. 1256–1269, 2015.
 [18] R. L. Wilson and P. A. Rosen, “Protecting data through’perturbation’techniques: The impact on knowledge discovery in databases,” in Information Security and Ethics: Concepts, Methodologies, Tools, and Applications. IGI Global, 2008, pp. 1550–1561.
 [19] R. Agrawal and R. Srikant, “Privacypreserving data mining,” in ACM Sigmod Record, vol. 29, no. 2. ACM, 2000, pp. 439–450.
 [20] E. Bertino, I. N. Fovino, and L. P. Provenza, “A framework for evaluating privacy preserving data mining algorithms,” Data Mining and Knowledge Discovery, vol. 11, no. 2, pp. 121–154, 2005.
 [21] K. Chen and L. Liu, “A random rotation perturbation approach to privacy preserving data classification,” The Ohio Center of Excellence in KnowledgeEnabled Computing, 2005. [Online]. Available: https://corescholar.libraries.wright.edu/knoesis/916/
 [22] ——, “Geometric data perturbation for privacy preserving outsourced data mining,” Knowledge and Information Systems, vol. 29, no. 3, pp. 657–695, 2011.
 [23] D. Bogdanov, S. Laur, and J. Willemson, “Sharemind: A framework for fast privacypreserving computations,” Computer SecurityESORICS 2008, pp. 192–206, 2008.
 [24] S. Agrawal and J. R. Haritsa, “A framework for highaccuracy privacypreserving mining,” in Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on. IEEE, 2005, pp. 193–204.
 [25] B. Thuraisingham, M. Kantarcioglu, E. Bertino, and C. Clifton, “Towards a framework for developing cyber privacy metrics: A vision paper,” in Big Data (BigData Congress), 2017 IEEE International Congress on. IEEE, 2017, pp. 256–265.
 [26] T. Drabas and D. Lee, Learning PySpark. Packt Publishing Ltd, 2017.
 [27] K. Chen and L. Liu, “Privacy preserving data classification with rotation perturbation,” in Data Mining, Fifth IEEE International Conference on. IEEE, 2005, pp. 4–pp.
 [28] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015.
 [29] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016. [Online]. Available: https://books.google.com.au/books?isbn=0128043571
Comments
There are no comments yet.