Privacy-Preserving Multiparty Protocol for Feature Selection Problem

10/11/2021
by   Shinji Ono, et al.
0

In this paper, we propose a secure multiparty protocol for the feature selection problem. Let D be the set of data, F the set of features, and C the set of classes, where the feature value x(F_i) and the class x(C) are given for each x∈ D and F_i ∈ F. For a triple (D,F,C), the feature selection problem is to find a consistent and minimal subset F' ⊆ F, where `consistent' means that, for any x,y∈ D, x(C)=y(C) if x(F_i)=y(F_i) for F_i∈ F', and `minimal' means that any proper subset of F' is no longer consistent. The feature selection problem corresponds to finding a succinct description of D, and has various applications in the field of artificial intelligence. In this study, we extend this problem to privacy-preserving computation model for multiple users. We propose the first algorithm for the privacy-preserving feature selection problem based on the fully homomorphic encryption. When parties A and B possess their own personal data D_A and D_B, they jointly compute the feature selection problem for the entire data set D_A∪ D_B without revealing their privacy under the semi-honest model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

08/17/2020

Privacy-preserving feature selection: A survey and proposing a new set of protocols

Feature selection is the process of sieving features, in which informati...
02/06/2021

Privacy-Preserving Feature Selection with Secure Multiparty Computation

Existing work on privacy-preserving machine learning with Secure Multipa...
01/17/2022

Privacy-Preserving Maximum Matching on General Graphs and its Application to Enable Privacy-Preserving Kidney Exchange

To this day, there are still some countries where the exchange of kidney...
07/31/2017

Consistent Nonparametric Different-Feature Selection via the Sparsest k-Subgraph Problem

Two-sample feature selection is the problem of finding features that des...
09/22/2020

Privacy Preserving K-Means Clustering: A Secure Multi-Party Computation Approach

Knowledge discovery is one of the main goals of Artificial Intelligence....
10/21/2009

Sparsification and feature selection by compressive linear regression

The Minimum Description Length (MDL) principle states that the optimal m...
01/03/2019

Secure Two-Party Feature Selection

In this work, we study how to securely evaluate the value of trading dat...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Motivation and related works

Feature selection is one of the typical problems in machine learning. For example, the human genome consists of 3.1 billion base pairs, of which at most a few dozen pairs are said to affect a particular disease. Feature selection extracts a set of features from this very sparse data that match a specific purpose, and the results are used by various machine learning algorithms. We will review the definition of the feature selection problem, its computational complexity, and the approximate solutions that have been proposed so far.

Feature selection is defined by a data set , a feature set , and a class set . Here, in particular, we consider feature and class to be binary, but it is easy to extend the problem to multi-label. Thus, a feature and a class for a data are denoted by by , and thus, when , the data

is associated with a binary vector of length

.

Given a triple , the goal of an algorithm is to extract a consistent and minimal , where is consistent if, for any , for all implies and a feature set is minimal if any proper subset of is no longer consistent.

1 0 1 1 1 0
1 1 0 0 0 0
0 0 0 1 1 0
1 0 1 0 0 0
1 1 1 1 0 1
0 1 0 1 0 1
0 1 0 0 1 1
0 0 0 0 1 1
 0.189  0.189  0.049  0.000  0.000
Table 1: An example dataset shown in [15].

To our knowledge, the most common method for finding features that characterize

is to select features that show higher relevance in some statistical measure. The relevance of individual features can be estimated using statistical measures such as mutual information and Bayesian risk. For example, at the bottom row of Table 

1, the mutual information score of each feature to class labels is described. We can see that is more relevant than , since . Based on the mutual information score, and of Table 1 will be selected to explain . However, looking into more closely, we understand that and cannot determine uniquely. In fact, we find and with and whose class labels are different. On the other hand, we can also find the fact that and uniquely determine by the formula while holds. Therefore, the traditional method based on relevance scores of individual features misses the right answer.

This problem is well known as the problem of interacting features, which has been intensively studied in machine learning research. The literature describes a class of feature selection algorithms that can solve this problem, referred to as consistency-based feature selection [14, 16, 17, 11, 2]. CWC (Combination of Weakest Components) [16] is the most simplest one of these consistency-based feature selection algorithms. CWC is the simplest of such consistency-based feature selection algorithms, and even though CWC uses the most rigorous measure, it shows one of the best performances in terms of accuracy as well as computational speed compared to other methods [15].

1.2 Our contribution

 Algorithm Time Space
a naive CWC on plaintext [16]
  secure CWC (baseline)
improved
Table 2: Time and space complexities of the baseline and improved algorithms for secure CWC. Here, the cost of encryption and homomorphic operations (addition, multiplication, and comparison) for arbitrary integers is considered constant.

We extend the feature selection problem to multi-users having their own private datasets and propose the first secure multi-party protocol to jointly compute the feature selection over the entire data without revealing their private information.

Our proposed method is a two-party protocol based on the fully homomorphic encryption. Here, we briefly explain the related works of homomorphic cryptosystem. Given a public-key cryptosystem, let be an integer encrypted with its public key; if (the cipher text of ) can be computed from and only with public information, in particular without decrypting and , then is said to be additive homomorphic, and if can also be computed from and in addition, then is said to be fully homomorphic. Besides, when any plaintext can encrypt into any element of a set consisting of sufficiently many ciphertexts and for each execution of encryption, such a ciphertext is chosen probabilistically, is said to be probabilistic. For a cryptosystem, being probabilistic is required to satisfy the security of ciphertext indistinguishability: given and when is secretly selected one of and uniformly at random, it is computationally impossible to guess .

In the last two decades, various homomorphic encryptions have been proposed that satisfy those homomorphic properties. The first (probabilistic) additive homomorphic encryption was proposed by Paillier [12]. Somewhat homomorphic encryption that allows a sufficient number of additions and a restricted number of multiplications have also been proposed [5, 6, 3], and by using these cryptosystems, we can compute more difficult problems, such as the inner product of two vectors. The first fully homomorphic encryption with unlimited number of additions and multiplications was proposed by Gentry [9], and since then, useful libraries for the fully homomorphic encryption have been developed specially for bitwise operations and floating point operations.

TFHE [7, 8] is known as a fastest fully homomorphic encryption specialized for bitwise operations. In this study, we use TFHE to design our algorithm for the multi-party protocols of feature selection problems. We assume that parties and have their own private data and , and and are known. Under the assumption that the parties can use their respective TFHE, say and , the goal of the parties is to jointly compute the result of CWC algorithm on the plain data , without revealing any other information about and .

We summarize the results of this work in Table 2. baseline is a naive algorithm that simulates the original CWC [16] over the ciphertexts using TFHE operations. Given , the essential task of CWC is to sort in an increasing order of their relevance to . Using the sorted , CWC decides whether or not should be selected for . The resulting features are the output of CWC.

It is well-known that sorting, the main task in CWC, is one of most difficult problems in secure computation. So we propose an improvement of the baseline algorithm reducing the cost of sorting. We show the time and space complexities for both algorithms in Table 2. This significantly improved the time complexity while maintaining the space complexity. We also implemented the baseline algorithm and examined its running time for real data. As a result, it was confirmed that most of the time was spent on sorting. The implementation of an improved algorithm is a future work.

2 Preliminaries

2.1 CWC algorithm over plaintext

For the dataset associated with and , we generally assume that contains no error, i.e., if for all , . When contains such errors, these are removed beforehand, then as a result, contains at most one with the same feature values.

We describe the original algorithm for finding a minimal consistent features in Algorithm 1. Given with and , a data of is called a positive data and of is called a negative data. Let be the number of positive data and . Let be the -th positive data and the -th negative data . Then, the bit string of length is defined by: if and otherwise. means that is not consistent with the pair because despite . Recall that is said to be consistent only if implies for any . Thus, is defined to be the number of s in .

For a subset, is said to be consistent, if for any and , there exists such that and hold. Using this, CWC removes irrelevant features from to construct a minimal consistent feature set111 Finding a smallest consistent feature set is clearly NP-hard due to an obvious reduction from the minimum set cover. .

1:  Input: A dataset associated with features and class .
2:  Output: A minimal consistent subset .
3:  Sort in the incremental order of .
4:  Let be the sorted indices of .
5:  for  do
6:     if  is consistent then
7:        update
8:     end if
9:  end for
Algorithm 1 The algorithm CWC for plaintext

In Table 3, we show an example of and the corresponding . Let us consider the behavior of CWC on this example. All are computed as preprocessing. Then, the features are sorted by the order and . By the consistency order , CWC checks whether can be removed from the current . By the consistency measure, CWC removes and and the resulting is the output. In fact, we can predict the class of by the logical operation .

0 1 1 0
1 0 1 0 0
1 1 0 0 0
0 1 0 1 0
0 0 1 1
1 0 1 0 0
1 1 0 0 0
Table 3: An example dataset with and . For and , the bit string is defined by the value of . For example, because only for the two pairs . Similarly, , , .

2.2 TFHE: a faster fully homomorphic encryption

For the privacy-preserving CWC, is entirely encrypted by a fully homomorphic encryption. We review the TFHE [8], one of the fastest libraries allowing the bitwise addition (this means XOR ‘’) and bitwise multiplication (AND ‘’). On TFHE, any integer is encrypted bitwise: For -bit integer , we denote its bitwise encryption by , for short. These operations are denoted by and for and the ciphertexts and . An encrypted array is denoted in the same way. For example, when and are integers of length and , respectively, we abbreviate the bitwise encryption of a sequence of integers, e.g., denotes the following ciphertext.

By the elementary operations and , TFHE allows all arithmetic and logical operations. Here, we describe how to construct the adder and comparison operations. Let be -bit integers and be the -th bit of respectively. Let be the -th carry-in bit and is the -th bit of the sum . Then, we can get by the bitwise operations of ciphertexts using and . Based on the adder, we can construct other operations like subtraction, multiplication, and division. For example, is obtained by , where is the bit complement of obtained by for all -th bit. On the other hand, we also review the comparison. We want to get without decrypting and where if and otherwise. Here, we can get the bit as the most significant bit of over ciphertexts. Similarly, we can compute the encrypted bit for the equality test.

Adopting those operations of TFHE, we design a secure multi-party CWC. In this paper, we omit the details of TFHE (see e.g., [7, 8]).

3 Algorithms

3.1 Baseline algorithm

We propose our baseline algorithm, which is a privacy-preserving version of CWC. In this subsection, we consider a two-party protocol, where a party has his private data and outsources the computation of CWC to another party , but our baseline algorithm is easily extended to a multi-party protocol where more than two parties cooperate one another to select features with the joint data. During the computation, party should not gain other information than the number of positive data, the number of negative data and the number of features. Note that party can conceal the actual number of data by inserting dummy data and telling the inflated numbers and to . The algorithm can distinguish dummy data by adding an extra bit indicating the data is a dummy iff the bit is . For each class the values of features and dummy bit of data in the class are encrypted by public key of and sent to .

The algorithm consists of three steps: Computing encrypted bit string , sorting ’s and executing feature selection on ’s.

Since all data in this subsection is encrypted by public key of , we omit the description of the encryption function to simplify presentation.

3.1.1 Computing

We can compute by , where represents the dummy bit for data . becomes iff is inconsistent for the pair of and . The part “” is added to make the whole value when one of the data is a dummy. It takes time and space in total.

3.1.2 Sorting ’s

We can compute in encrypted form by summing up values in in time (noting that each operation on integers of bits takes time). Instead we can set an upper bound of the bits used to store consistency measure to reduce the time complexity to .

Then sorting ’s in the incremental order of consistency measures can be done using any sorting network in which comparison and swap are conducted in encrypted form without leaking the information about ordering of features. Note that, in this approach, the algorithm has to spend time to swap (or pretend to swap) two bit strings and original feature indices of bits regardless that two features are actually swapped or not. Since this is the heaviest part in our baseline algorithm, we will show how to improve it. Using AKS sorting network [1] of size , the total time for sorting ’s is .

In our experiments we use a more practical sorting network of Batcher’s odd-even mergesort 

[4] of size . Recently, a simple oblivious radix sort [10] in algorithm under the assumption that the bit length of each integer is constant.

3.1.3 Selecting features

Let be the sorted list of features. We first compute a sequence of bit strings of length each such that for any and , namely is the bit array storing cumulative or of each position for . The computation takes time and space.

For feature selection, we simulate Algorithm 1 on encrypted ’s and ’s. In addtion we use two -initialized bit arrays, of length and of length . is meant to store iff the -th feature (in sorted order) is selected. is used to keep track of the cumulative or for the bit strings of the currently selected features. Namely, is set to if features

have been selected at the moment.

Suppose that we are in the -th iteration of the for loop of Algorithm 1. Note that is consistent iff is . Since we keep the -th feature iff is inconsistent, the algorithm sets . After computing , we can correctly update by for every in time.

Since each feature is processed in time, the total computational time is .

3.1.4 Summing up analysis

The bottleneck of computational time is of the sorting step. Since CWC works with any consistent measure, we do not have to use in full accuracy, and thus, we assume that is set to be a constant. Under the assumption, we obtain the following theorem.

Theorem 1

We can securely simulate CWC in time and space without revealing the private data of the parties under the assumption that TFHE is secure.

  • According to the discussion above, computing for all features takes time and space, sorting features takes time, and selecting features takes time. Finally, party computes in time an integer array with , which stores the original indices of selected features. Party randomly shuffles and sends to party as the result of CWC. Therefore, we can securely simulate CWC in time and space.

3.2 Improvement of secure CWC

1:  Assumption: and are ’s and ’s public key encryption functions that are homomorphic for addition and probabilistic, respectively.
2:   generates random and sends and to .
3:   generates random and sends and to .
4:   obtains .
Algorithm 2 Mix network between and for integer array, e.g.,
1:  Preprocess: Parties and jointly compute where for -th feature .
2:  Assumption: Party possesses the encrypted by ’s public key .
3:  By the mix network, obtains a shuffled with a secret permutation by .
4:   sorts all pairs in by the increasing order of .
5:   sends the sorted with random noise as
6:   sends to .
7:   moves each to its correct position by .
8:   executes CWC for the sorted and obtains the cipher bit where iff is selected.
9:   decrypts and obtains the selected features to be shared with .
Algorithm 3 Improved secure CWC over ciphertext
Figure 1: Improved secure CWC. (1): Parties and jointly compute and for each feature . (2) and (3): obtain a shuffled data by the mix network between and . (4): sorts by . (5): receives the rank of with a random noise and returns their permutation to . (6): moves the bit string according to the information from . (7): and jointly compute the selected features by and selected features over the renamed space.

The task of sorting is a major bottleneck for CWC in secret computation presented above. The reason is that pointers cannot be moved over ciphertexts. For example, consider the case of secure integer sort. Let the variables and contain integers and , respectively. Here, by performing the secure operation , the result is obtained as . Using this logical bit , we can swap the values of and in time satisfying by the secure operation and .

However, in the case of CWC, each integer of feature is associated with the bit string . Since any cannot be decrypted, we cannot swap the pointers appropriately. Therefore, the baseline algorithm swaps itself. As a result,the computation time for sorting increases to . We improve the time complexity to .

Since the improved algorithm uses the mix network mechanism [13] as a subroutine, we first give a brief overview of the mix network.

The purpose of a mix network is, given an encrypted sequence , to obtain a random permutation , where and are re-encrypted and shuffled. Recall that is a probabilistic encryption. Thus, we cannot know how they were shuffled by comparing and the original . Among two parties and , the mix network can be realized using the public key encryption of and . We show such a mix network in Algorithm 2. We can assume that cannot know any information about the permutation without decryption.

Using the mix network, we propose the improved secure CWC (Algorithm 3) reducing the time complexity to . An example run of Algorithm 3 is illustrated in Fig. 1. As shown in this example, the party can securely sort randomized features in time and then swap each associated bit string of length in time. After this preprocessing, the parties obtain a minimal consistent features decrypting the output of CWC. Finally, we obtain the following result.

Theorem 2

Algorithm 3 can securely simulate CWC in time and space without revealing the private data of the parties under the assumption that TFHE is secure.

  • The party shuffles by a permutation . The parties communicate only in the step 5 and 6. can decrypt any , but due to the added noise, he cannot know anything about . On the other hand, obtains the plaintext , but he cannot compute . Thus, the parties cannot get the rank of the original feature from each party’s information alone. Therefore, the protocol of Algorithm 3 is as secure as TFHE. On the other hand, the time and space complexities are clear because the algorithm moves of length at most times. It follows that the time complexity is reduced to .

4 Experiments

We implemented our baseline algorithm in C++ using TFHE library for bitwise operations on fully homomorphic encryption. The experiments were conducted on the machine with Intel Core i7-6567U (3.30GHz) and 16GB RAM. In the following, (resp. ) is the number of positive (resp. negative) data and is the number of features.

Table 5 shows the time for computing in three different sizes of . As the theoretical time bound suggests, the time linearly increases to the size of . We note that can be computed independently from other with , and thus, thery can be computed in parallel.

Table 5 shows the time for computing while changing the size of and upper bound of bits to store consistency measures. Since there are additions to a -bits integer, the time complexity is . We can observe that the time per addition linearly increases to . Note that the computation of for all features can be conducted in parallel.

time [sec]
100 6.04
500 30.06
1000 60.14
Table 5: Time for computing
time [sec] time per addition [sec]
100 7 27.624 0.28
500 9 175.826 0.35
1000 10 394.967 0.40
Table 4: Time for computing ’s

Table 7 shows the time for swapping a pair of data in the sorting procedure. Since the theoretical time complexity is , the time is mostly dominated by .

Since the whole procedure of sorting takes a long time, we estimate it from the time for a single swap in Table 7. Table 7 shows the estimated total time for sorting ’s with OEM sort. Here we assume that all the swaps are conducted in serial (without utilizing parallelism of sorting network).

time [sec]
10 100 7 4 8.88
10 500 9 4 39.59
10 1000 10 4 78.05
50 100 7 6 9.05
50 500 9 6 39.73
50 1000 10 6 78.04
100 100 7 7 9.10
100 500 9 7 39.93
100 1000 10 7 77.94
Table 7: Estimated total time for sorting ’s with OEM sort.
# swaps time [sec]
10 100 63 559.57
10 500 63 2494.23
10 1000 63 4917.40
50 100 543 4911.44
50 500 543 21573.39
50 1000 543 42376.26
100 100 1471 13386.10
100 500 1471 58732.62
100 1000 1471 114646.80
Table 6: Time for swapping a pair of data

Table 8 shows the time for feature selection from sorted list of ’s. The results follow the theoretical time complexity .

time [sec]
10 100 111.96
10 500 558.17
10 1000 1114.24
50 100 589.35
50 500 2941.07
50 1000 5919.71
100 100 1179.06
100 500 5952.54
100 1000 11867.00
Table 8: Time for feature selection from sorted list of ’s

Table 9 summarizes the time for each step of our baseline algorithm under the assumption that the parallelism is not used. The table shows that the sorting part is the bottleneck.

Step 1 [sec] Step 2 [sec] Step 3 [sec]
10 100 60.37 835.81 111.96
10 500 300.59 4252.49 558.17
10 1000 601.41 8867.07 1114.24
50 100 301.85 6292.64 589.35
50 500 1502.95 30364.69 2941.07
50 1000 3007.05 62124.61 5919.71
100 100 603.70 16148.50 1179.06
100 500 3005.90 76315.22 5952.54
100 1000 6014.10 154143.50 11867.00
Table 9: Time for each step. Step 1: Computing ’s. Step 2: Sorting ’s. Step 3: feature selection.

As we can see from the experimental results (e.g. Table 9), most of the computational time of the baseline algorithm is spent on sorting. Thus, the implementation of the improved secure CWC is an important future work. Although we have implemented a two-party protocol, our algorithm including the improved secure CWC can be easily extended to general multiparty protocols.

References

  • [1] M. Ajtai and E. S. J. Komlós (1983) An sorting network. In STOC, pp. 1–9. Cited by: §3.1.2.
  • [2] H. Almuallim and T.G. Dietteric (1994) Learning boolean concepts in the presence of many irrelevant features. Artif. Intell. 69 (1-2), pp. 279–30. Cited by: §1.1.
  • [3] N. Attrapadung, G. Hanaoka, S. Mitsunari, Y. Sakai, K. Shimizu, and T. Teruya (2018) Efficient two-level homomorphic encryption in prime-order bilinear groups and a fast implementation in webassembly. In ASIACCS, pp. 685–697. Cited by: §1.2.
  • [4] K.E. Batcher (1968) Sorting networks and their applications. In AFIPS Spring Joint Computing Conference, pp. 307–314. Cited by: §3.1.2.
  • [5] D. Boneh, E.J. Goh, and K. Nissim (2005) Evaluating 2-DNF formulas on ciphertexts. In TCC, pp. 325–341. Cited by: §1.2.
  • [6] Z. Brakerski, C. Gentry, and V. Vaikuntanathan (2012) (Leveled) fully homomorphic encryption without bootstrapping. In ITCS, pp. 309–325. Cited by: §1.2.
  • [7] I. Chillotti, N. Gama, M. Georgieva, and M. Izabachène (2020) TFHE: fast fully homomorphic encryptionover the torus. Journal of Cryptology 33, pp. 34–91. Cited by: §1.2, §2.2.
  • [8] I. Chillotti, N. Gama, M. Georgieva, and M. Izabachène (August 2016) TFHE: fast fully homomorphic encryption library. Note: https://tfhe.github.io/tfhe/ Cited by: §1.2, §2.2, §2.2.
  • [9] C. Gentry (2009) Fully homomorphic encryption using ideal lattices. In STOC, pp. 169–178. Cited by: §1.2.
  • [10] K. Hamada, K. C. D. Ikarashi, and K. Takahashi (2014) Oblivious radix sort: an efficient sorting algorithm for practical secure multi-party computation. In IACR Cryptol. ePrint Arch., pp. 121. Cited by: §3.1.2.
  • [11] H. Liu, H. Motoda, and M. Dash (1998) A monotonic measure for optimal feature selection. In ECML, pp. 101–106. Cited by: §1.1.
  • [12] P. Paillier (1999) Public-key cryptosystems based on composite degree residuosity classes. In EUROCRYPT, pp. 223–238. Cited by: §1.2.
  • [13] K. Sampigethaya and R. Poovendran (2006) A survey on mix networks and their secure applications. Proceedings of the IEEE 94 (12), pp. 2142–2181. Cited by: §3.2.
  • [14] K. Shin, D. Fernandes, and S. Miyazaki (2011) Consistency measures for feature selection: a formal definition, relative sensitivity comparison, and a fast algorithm. In IJCAI, pp. 1491–1497. Cited by: §1.1.
  • [15] K. Shin, T. Kuboyama, T. Hashimoto, and D. Shepard (2017) SCWC/slcc: highly scalable feature selection algorithms. Information 8 (4), pp. 159. Cited by: §1.1, Table 1.
  • [16] K. Shin and X.M. Xu. (2009) Consistency-based feature selection. In KES, pp. 28–30. Cited by: §1.1, §1.2, Table 2.
  • [17] Z. Zhao and H. Liu (2007) Searching for interacting features. In IJCAI, pp. 1156–1161. Cited by: §1.1.