Secure Social Recommendation based on Secret Sharing

02/06/2020 ∙ by Chaochao Chen, et al. ∙ Ant Financial 0

Nowadays, privacy preserving machine learning has been drawing much attention in both industry and academy. Meanwhile, recommender systems have been extensively adopted by many commercial platforms (e.g. Amazon) and they are mainly built based on user-item interactions. Besides, social platforms (e.g. Facebook) have rich resources of user social information. It is well known that social information, which is rich on social platforms such as Facebook, are useful to recommender systems. It is anticipated to combine the social information with the user-item ratings to improve the overall recommendation performance. Most existing recommendation models are built based on the assumptions that the social information are available. However, different platforms are usually reluctant to (or cannot) share their data due to certain concerns. In this paper, we first propose a SEcure SOcial RECommendation (SeSoRec) framework which can (1) collaboratively mine knowledge from social platform to improve the recommendation performance of the rating platform, and (2) securely keep the raw data of both platforms. We then propose a Secret Sharing based Matrix Multiplication (SSMM) protocol to optimize SeSoRec and prove its correctness and security theoretically. By applying minibatch gradient descent, SeSoRec has linear time complexities in terms of both computation and communication. The comprehensive experimental results on three real-world datasets demonstrate the effectiveness of our proposed SeSoRec and SSMM.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Nowadays, recommender systems have been extensively used in many commercial platforms [chen2018distributed]. The key point for recommendation is to use as much information as possible to learn better preferences of users and items. To achieve this, besides user-item interaction information, additional information such as social relationship and contextual information have been utilized [ma2011recommender, rendle2010factorization, chen2018semi].

Existing researchers usually make the assumption that all kinds of information are available, which is somehow inconsistent with most of the real-world cases. In practice, different kinds of information are located on different platforms, e.g., huge user-item interaction information on Amazon while rich user social information on Facebook. However, different platforms are reluctant to (or can not) share their own data due to competition or regulation reasons.

Therefore, for the recommendation platforms who have rich user-item interaction data, how to use the additional data such as social information on other platforms to further improve recommendation performance, meanwhile protect the raw data security of both platforms, is a crucial question to be answered. It is worthwhile to study such a research topic in both industry and academia.

Secure Multi-Party Computation (MPC) provides a solution to the above question. MPC aims to jointly compute a function for multi-parties while keeping the individual inputs private [yao1986generate]

, and it has been adopted by many machine learning algorithms for secure data mining, including decision tree

[lindell2005secure]

, linear regression

[nikolaenko2013privacy]

, and logistic regression

[mohassel2017secureml]. However, it has not been applied to the above-mentioned secure multi-party recommendation problems yet.

In this paper, we consider the scenarios where user-item interaction information and user social information are on different platforms, which is quite common in practice. Platform has user-item interaction information and Platform has user social information, the challenge is to improve the recommendation performance of by securely using the user social information on . To fulfill this, we formalize secure social recommendation as a MPC problem and propose a SEcure SOcial RECommendation (SeSoRec) framework for it. Our proposed SeSoRec is able to (1) collaboratively mine knowledge from social platform to improve the recommendation performance of the rating platform, and (2) keep the raw data of both platforms securely. We further propose a novel Secret Sharing based Matrix Multiplication (SSMM) protocol to optimize SeSoRec, and we also prove its correctness and security. Our proposed SeSoRec and SSMM have linear computation and communication complexities. Experimental results conducted on three real-world datasets demonstrate the effectiveness of our proposed SeSoRec and SSMM.

We summarize our main contributions as follows:

  • [leftmargin=*]

  • We observe a secure social recommendation problem in practice, formalize it as a MPC problem, and propose a SeSoRec framework for it.

  • We propose a novel Secret Sharing based Matrix Multiplication (SSMM) protocol to optimize SeSoRec, and we also prove its correctness and security.

  • Our proposed SeSoRec and SSMM have linear computation and communication complexities.

  • Experimental results conducted on three real-world datasets demonstrate the effectiveness of SeSoRec and SSMM.

2 Background

In this section, we review related backgrounds, including (1) social recommendation, (2) secure multi-party computation, and (3) privacy preserving recommendation.

2.1 Social Recommendation

Factorization based recommendation [mnih2007probabilistic, koren2009matrix, chen2018distributed, chen2018semi] is one of the most popular approaches in recommender system. It factorizes a user-item rating (or other interaction) matrix into a user latent matrix and an item latent matrix. However, traditional factorization based approaches assume that users are independent and identically distributed, which is inconsistent with the reality that users are inherently connected via various types of social relations such as friendships and trust relations. Therefore, social factorization models incorporate social relationship into account to improve recommendation performance [ma2011recommender], and the basic intuition is that connected users are likely to have similar preferences. According to [tang2013social], social factorization models can be formally stated as:
social factorization model = basic factorization model + social information model.

To date, different social information models were proposed to capture social information, and the basic intuition is that connected users are likely to have similar preferences.

2.2 Secure Multi-Party Computation

The concept of secure Multi-Party Computation (MPC) was formally introduced in [yao1982protocols]

, which aims to generate methods (or protocols) for multi-parties to jointly compute a function (e.g., vector multiplication) over their inputs (e.g., vectors for each party) while keeping those inputs private. MPC can be implemented using different protocols, such as garbled circuits

[yao1986generate], GMW [goldreich1987play], and secret sharing [shamir1979share]. MPC has been applied into many machine learning algorithms, such as decision tree [lindell2005secure], linear regression [nikolaenko2013privacy], logistic regression [mohassel2017secureml], and collaborative filtering [shmueli2017secure]. In this paper, we propose a secret sharing based matrix muliplication algorithm for secure social recommendation.

2.3 User Privacy Preserving Recommendation

Another related research area belongs to privacy preserving recommendation. Recently, user privacy has drawn lots of attention, and how to train models while keeping user privacy becomes a hot research topic [mcmahan2016communication, chen2018]. There are research works adopt garbled circuits to protect user privacy while making recommendation [nikolaenko2013privacymf]. Some other works use differential privacy to protect user privacy while training recommendation models [mcsherry2009differentially, hua2015differentially, meng2018personalized].

Difference between user privacy and data security. User privacy preserving recommendation aims to protect user privacy on the customer side (2C), while data security based recommendation intends to protect the data security of business partners who have already collected users’ private data (2B). In this paper, we aim to (1) integrate rating platform and social platform for better recommendation, and (2) protect the data security of both platforms.

3 The Proposed Model

In this section, we first formally describe the secure social recommendation problem, and then present our proposed SEcure SOcial RECommendation (SeSoRec) framework for this problem.

3.1 Problem Definition

Formally, let be the user-item interaction platform, and and be the user and item set on it, with and denoting user size and item size, respectively. Let be user-item interaction set between user and item , is the total number of ratings. Let R be the user-item interaction matrix, with element being the rating of user on item . Let and denote the user and item latent factor matrices, with their column vectors and being the -dimensional latent factors for user and item , respectively. Let be the user social platform, and we assume that the social platform has the same user set as the user-item interaction platform . We further let S be the user-user social matrix111Note that our model can be slightly modifed to meet the case when S is asymmetric. , with the element being the social relationship strength between user and user .

The problem of secure social recommendation is that, platforms and securely keep their own data and model, meanwhile can improve its recommendation performance by utilizing the social information of .

3.2 Secure Social Recommendation Framework

Social recommendation can be formalized as a basic factorization model plus a social information model, based on the assumption that connected users tend to have similar preferences, as described in Section 2.1. Most existing social factorization models have the following objective function

(1)

where is the loss of the basic factorization model that restricts the relationship between the true ratings and predicted ratings, is the loss of the social information model that restricts the preferences of users who have social relations, and controls the social restriction strength. A classical example is the Social Regularizer recommendation (Soreg) approach [ma2011recommender], where

(2)
(3)

where is the indicator function that equals to 1 if there is an existing user-item interaction pair and 0 otherwise, and is the Frobenius norm.

Traditional social recommendation frameworks such as Soreg can be efficiently solved by stochastic Gradient Descent (GD). However, the social information model in Equation (

3) involves a real number which belongs to the social platform , and two real-valued vectors and which are located on the rating platform , secure computation are not guaranteed due to the breach that can easily deduce the values belonging to .

To solve this problem, we propose to use minibatch GD instead of stochastic GD. We use B to denote the user-item rating set in the current minibatch and is the batch size. Let and be the user set and item set in the current batch, and be the user and item size. Apparently and . We use to denote the rating matrix in the current batch, to denote the indicator matrix in the current batch. Let and be the latent factors of the corresponding users and items in the current minibatch. Equation (1) becomes

(4)

where is a diagonal matrix with diagonal element , is the social matrix of the users in current minibatch, and is also a diagonal matrix with diagonal element . The gradients of in Equation (4) with respect to and are

(5)
(6)

where is a diagonal matrix with diagonal element which is get by extracting the corresponding users’ diagonal elements from E in current batch.

We observe in Equations (5) and (6) that the matrix product terms , , and are crucial. These terms involve one matrix (U or ) on the rating platform and another matrix (, , or ) on the social platform. All the other terms can be calculated locally by the rating platform. Therefore, we conclude that the key to secure social recommendation is the secure matrix multiplication operation, which is a secure MPC problem. We summarize the proposed SEcure SOcial RECommendation (SeSoRec) solution in Algorithm 1, and will present how to perform secure matrix multiplication in the next section.

Input: The observed rating matrix (R) on platform , user social matrix (S) on platform , regularization strength(, ), learning rate (), and maximum iterations ()
Output: user latent matrix (U) and item latent matrix (V) on platform
1 Platform initializes U and V
2 for  to  do
3       and calculate and based on the secure matrix multiplication in Algorithm 2
4       locally calculates based on Equation (5)
5       locally calculates based on Equation (6)
6       locally updates U by
7       locally updates V by
8      
9 end for
return U and V on
Algorithm 1 Secure social recommendation

4 Secret Sharing based Matrix Multiplication

In this section, we first describe technical preliminaries, and then present a secure matrix multiplication protocol, followed by its correctness and security proof.

4.1 Preliminaries

Secret Sharing. Our proposal relies on Additive Sharing. We briefly review this but refer the reader to [demmler2015aby] for more details. To additively share an -bit value for two parties ( and ), party generates uniformly at random, sends to to party , and keeps mod . We use to denote the share of party . To reconstruct an additively shared value , each party sends to one who computes mod . In this paper, we denote additive sharing by .

Apply to decimal numbers. The above protocol can not work directly with decimal numbers, since it is not possible to sample uniformly in [cock2015fast]. Following the existing work [mohassel2017secureml], we approximate decimal arithmetics by using fixed-point arithmetics. First, fixed-point addition is trivial. Second, for fixed-point multiplication, we use the following strategy. Suppose and are two decimal numbers with at most bits in the fractional part, we first transform them to integers by letting and , and then calculate . Finally, the last bits of are truncated so that it has at most bits representing the fractional part. The correctness of the above truncation technique for secret sharing can be found in [mohassel2017secureml].

Simulation-based Security Proof. To formally prove that a protocol is secure, we adopt the semi-honest point of view [goldreich2004FCV], where each participant truthfully obeys the protocol while being curious about the other parties’ original data. Under the real world and ideal world simulation-based proof [lindell2017simulate], whatever can be computed by one party can be simulated given only the messages it receives during the protocol, which implies that each party learns nothing from the protocol execution beyond what they can derive from messages received in the protocol. To formalize our security proof, we need the following notations:

  • [leftmargin=*]

  • We use to denote a function with two variables, where could be encodings of any mathematical objects, e.g. integers, vectors, matrices, or even functionals. We also use to denote a two-party protocol for computing .

  • The view of the -th party (, ) during the execution of is denoted as which can be expanded as , where is the input of -th party, is its internal random bits, and is the messages received or derived by the -th party during the execution of . Note that includes all the intermediate messages received, all information derived from the intermediate messages, and also the output of -th party during the protocol.

  • A probability ensemble

    is an infinite sequence of random variables indexed by

    and . In the context of secure multiparty computation, represents each party’s input and represents problem size.

Definition 1

Two probability ensembles and are said to be computatitionally indistinguishable, denoted by , if for every non-uniform polynomial-time algorithm and every polynomial , there exists an such that for every and every ,

Definition 2

Let be a function. We say a two-party protocol computes with information leakage to party and to party where each party is viewed as semi-honest adversaries, if there exist probabilistic polynomial-time algorithms and such that

where and .

4.2 Secret Sharing based Matrix Multiplication

Secure matrix multiplication is the key to SeSoRec. There are several approaches for secure matrix multiplication, such as homomorphic encryption [han2008privacy, teo2012study, dumas2016private] and the secret sharing scheme [de2017efficient], among which secret sharing is much more efficient. Existing secret sharing based matrix multiplication [de2017efficient] either needs a trusted Initializer (a trusted third party) or expensive cryptographic primitives [keller2018overdrive] to generate randomness before computation, i.e., Beaver’s pre-computed multiplication triplet [beaver1991efficient]. We call it Trusted Initializer based Secure Matrix Multiplication (TISMM), which may not be applicable in reality. Besides, TISMM needs to generate many random matrices, causing efficiency concerns.

In this paper, we propose a novel protocol for secure and efficient matrix multiplication using secret sharing. Suppose two parties and hold matrix and matrix separately, where is an even number222One can simply change y to an even number by adding an additional zero column in P and zero row in Q. Our algorithm generalizes the inner product algorithm proposed in [zhu2015ESP] to compute the matrix product PQ. We first summarize our proposed Secret Sharing based Matrix Multiplication (SSMM) in Algorithm 2, and then prove its correctness and security.

4.3 Correctness Proof

According to Algorithm 2, we have

(7)
(8)
(9)
(10)

Equation (8) is by substituting , , , and in Equation (7) according to Algorithm 2 (Line 4 and Line 5). Equation (9) holds by simplifying Equation (8). The -th entry of is the inner product of the -th row of and the th column of . Finally, by matrix definition, the -th entry of (resp.

) is the inner product of the odd (resp. even) terms in the

-th row of and the odd (resp. even) terms in the -th column of , so we have and the last three terms in Line 4 are cancelled. Thus, the correctness is proved.

4.4 Security Proof

Theorem 1

Protocol of SSMM (Algorithm 2) computes matrix multiplication with information leakage to and information leakage to

We first give some intuitive discussions on the information disclosure of Algorithm 2. Let and be sub-matrices of P constructed by its even columns and odd columns. Similarly let and be sub-matrices of Q constructed by its even and odd rows. As indicated in line 4 of Algorithm 2, has , from . By extracting the even column sub-matrix and odd column sub-matrix of , can calculate . Since , we have . Thus, can compute by subtracting from . Similar arguments will show that can compute as partial information obtained from . Although and both have some level of information disclosed as discussed above, their own private data are still unrevealed.

We then rigorously prove the security level of SSMM, using the preliminary techniques we have given above. Note that we first assume all the matrices are finite field (), and then apply fixed point decimal arithmetics. Without loss of generality, we first let be the adversary and quantify the information leakage to . The view of in real world contains all information of matrices P and (including their even and odd column sub-matrices), together with and . The key point of the proof is to construct a simulator which can reproduce the same distribution of and . The simulator for ’s view proceeds like this:

  • Assume has as prior knowledge;

  • generate random matrix

    ;

  • Calculate .

and are similarly defined as the even column and odd column sub-matrices of . We claim that has the same distribution as , thus being computationally indistinguishable. To see this, we first notice that is the difference between a random matrix and a fixed matrix Q, which is equally distributed as a random matrix, say . With this in mind, it can be seen similarly that is equally distributed as and is equally distributed as . Therefore, is equally distributed as . Moreover, can reproduce all information of matrices P and . So with additional information of , the ideal world simulator successfully reconstructs the view of , which is equivalent to say that in the real world, only partial information has been disclosed to after running the protocol.

Similar simulator can be constructed when assuming as the adversary. This completes the security proof.

Input: A private matrix for , and a private matrix for
Output: A matrix for , and a matrix for , such that
1 and locally generate random matrices and
2 locally extracts even columns and odd columns from , and get and
3 locally extracts even rows and odd rows from , and get and
4 computes and , and sends and to
5 computes and , and sends and to
6 locally computes
7 locally computes
8 sends N to , and calculates
return for
Algorithm 2 Secret Sharing based Matrix Multiplication (SSMM)
      
  • [leftmargin=*]

  • generate random numbers for all non-zero entries

  • randomly select entries from the zero entries and generate random numbers for these entries

end for
The value is the selected number of non-zero entries of all rows in Q, and the above new strategy makes small in order to guarantee that the secret shares from to are sparse. However, as becomes smaller, would obtain more information on Q. An extremal case is , in which can infer the overall sparsity of . Therefore, a reasonable way is to choose . Note that, in practice, one should keep its strategies of choosing (i.e., the ratio of for each row) privately in case of information leakage. As long as Q is sparse, and are both sparse and can be calculated when generating . The computational complexity for matrix multiplication decreases to , and the communication complexity from to decreases to , and thus they are significantly reduced compared to the general case analysis. We remark that the above new secret sharing strategy for sparse matrix exactly satisfy our requirement in SeSoRec. Usually the social matrix S is sparse. When the user social platform shares its secrets, it can use the above new strategy to generate its secret shares. Moreover, the choice of can be private to only so that the user-item interaction platform cannot gain more information based on the shares from .

5 Analysis

In this section, we analyze the time complexity of SeSoRec and discuss its usage and information leakage.

5.1 Complexity Analysis of SeSoRec

We first analyze the communication and computation complexities of SeSoRec, as shown in Algorithms 1. Recall that is user number, and denote the user and item numbers in the current minibatch respectively, denotes the dimension of latent factor, and is the number of ratings (data zise).

Communication Complexity. The communications come from the calculations of , , and using SSMM. First, for and , by refering to the complexity analysis of the modified SSMM, their communication costs are both for each minibatch, and are both for passing the dataset once. Seconed, for , the communication of U only needs to be done once for each data pass, and therefore, its communication cost is . To this end, the total communication costs are for passing dataset once. Since, and , the total communication cost is linear with data size.

Computation Complexity. Suppose the average number of neighbors for each user on platform is . The time complexity of lines 6 and 7 in Algorithm 2 is for each minibatch, and is for passing the dataset once. Similarly, the time complexity of the lines 3 and 4 in Algorithm 1 for passing the dataset once is . Since , the total computation cost is also linear with data size.

By applying minibatch gradient descent, the communication and computation complexities of SeSoRec are both linear with data size and thus can scale to large dataset.

5.2 Discussion

Secure common user identification. Our proposed SeSoRec assumes that platforms and have the same user set in common, so that they can proceed SSMM. The essence of secure common user identification is private set intersection (PSI). Existing work [pinkas2014faster] has provided efficient solution. PSI can be applied to identify common users on two platforms privately before adopting SeSoRec in practice, which guarantees that nothing reveals but the IDs of common users.

Information leakage. SeSoRec is asymmetric for two parties, that is, the rating platform and the social platform collaboratively conduct SSMM and return the results to . Therefore, reveals more information to . Although we have proven its security, it may still cause information leakage of when maliciously initiate SSMM iteratively. Suppose and calculate PQ using SSMM, can infer Q by varying P and fixing Q and doing this procedure with enough rounds. A naive solution is to set a constraint on Q when conducting SSMM. As long as Q (users in each minibatch) is different in each iteration, SeSoRec will have no information leakage. We leave better solutions of this as a future work. Moreover, when one matrix is sparse in SSMM and the strategies of choosing are exposed, the social platform may leak some social information to . Specifically, under this circumstance, the sparsity of the social matrix on is leaked to , however the specific social values are still protected. Therefore, it is crucial that keeps its selection of for each row of the social matrix privately.

6 Experiments

In this section, we perform experiments to answer the following question. Q1: how does SeSoRec perform comparing with the classic matrix factorization and unsecure social recommendation models, Q2: what is the performance of SSMM comparing with the existing TISMM, and Q3: how does the social parameter () affect our model performance.

6.1 Setting

We first describe the datasets, metrics, and comparison methods we use in experiments.

Datasets. We use three public real-world datasets, i.e., Epinions [massa2007trust], FilmTrust [guo2013novel] and Douban Movie [Zhong2012CAT]. All these datasets contain user-item ratings and user social (trust) information, and are widely adopted in literature. Note that although rating and social information are both available in these datasets, we realistically assume that they are located on separate platforms without any possibility of data sharing, which has no side-effect on experiments.

Dataset #user #item #rating() #social()
Epinions 8,619 5,539 229,920 232,461
FilmTrust 1,508 2,071 35,497 1,853
Douban 13,530 13,363 2,530,594 264,811
Table 1: Dataset statistics. Assuming that rating information exist on and social information are available on .

Since the original rating matrices of Epinions and Douban are too sparse, we filter out the users and items whose interactions are less than 20. Table 1 shows the statistics of these datasets after preprocessing, with which we use five-fold cross validation method to conduct experiments and evaluate model performance. That is, we split the dataset into five parts, and each time we use four parts as the training set and take the last part as test set.

Metrics. To evaluate model performance, we adopt two types of metrics, Root Mean Square Error (RMSE) and Normalized Discounted Cumulative Gain (NDCG@n), both of which are popularly used to evaluate factorization based recommendation performance in literature [koren2009matrix, he2015trirank]. RMSE is defined as

where is the predicted rating of user on item , and is the number of predictions in the test dataset . RMSE evaluates the error between real ratings and predicted ratings, with smaller values indicating better performance. NDCG@n is defined as

where is a normalizer to ensure that the perfect ranking has value 1 and is the relevance (real ratings) of item at position . NDCG evaluates the ranking performance of recommendation models, with larger values being better. We report NDCG@10 in experiments, and abbreviate it as NDCG.

Comparison methods. Our proposed SeSoRec is a novel secure social recommendation model, which is a secure version of Soreg [ma2011recommender]. We compare SeSoRec with the following latent factor models:

  • [leftmargin=*]

  • MF [mnih2007probabilistic] is a classic latent factor model, which only uses the user-item interaction information on platform . This is the situation where the social platform is reluctant to share raw social information with the rating platform .

  • Soreg [ma2011recommender] is a classic social recommendation model, which is unsecure in the sense that needs the raw data of .

Note that we do not compare with the state-of-the-art recommendation methods. The reason is: (1) most of them assume the recommendation platform has many different kinds of information such as contextual information [rendle2010factorization], which are unfair for our method to compare with, and (2) our focus is to study the difference between traditional unsecure social recommendation models and our proposed secure social recommendation model.

Dataset Metrics MF Soreg SeSoRec
Epinions RMSE 1.2687 1.1791 1.1789
NDCG 0.0363 0.0405 0.0401
FilmTrust RMSE 1.1907 1.1754 1.1752
NDCG 0.2042 0.2128 0.2124
Douban RMSE 0.7489 0.7420 0.7419
NDCG 0.0749 0.0780 0.0778
Table 2: Performance comparison on three dataset, including RMSE and NDCG.
dimension () 100 1000  10000
SSMM 0.0025 0.3246 40.744
TISMM 0.0060 0.7279 105.83
Table 3: Running time comparison of SSMM and TISMM.

Hyper-parameters. We set the latent factor dimension , batch size , and vary regularizer and learning rate to choose their best values. We also vary in to study its effects on SeSoRec. For other parameters, e.g., regularizer and learning rate , we use grid search to find their best values of each model.

6.2 Comparison Results (To Q1)

We report the comparison results on three datasets in Table 2. From it, we can observe that: (1) Soreg and SeSoRec consistently outperform MF. Moreover, we find that the sparser the dataset is, the more Soreg and SeSoRec improve MF. Take RMSE for example, SeSoRec improves MF at 7.60%, 1.3%, and 0.98% on Epinions, FilmTrust, and Douban, with their rating densities 0.48%, 1.14%, and 1.4%, respectively. The results prove that social information is indeed important to recommendation performance, especially when data is sparse. (2) Soreg and SeSoRec achieve almost the same recommendation accuracy, where the differences come from the fixed point decimal numbers in secret sharing. The result further validates the correctness of our proposed SSMM besides the theoretical proof.

6.3 Comparison between Ssmm and TISMM (To Q2)

As we described in SSMM section, existing Trusted Initializer based Secure Matrix Multiplication (TISMM) [de2017efficient] needs a trusted initializer (a trusted third party) to generate secrets before computation. Although TISMM may not be applicable in practice, we would like to compare the efficiency of our proposed SSMM with it. To this end, we randomly generate two square matrices and , where is the dimension of the square matrix. We then report the running time (in seconds) of calculating PQ using both algorithms in Table 3, where we use local area network. It can be easily seen that our proposed SSMM costs much less time than TISMM. The speedup is around 2.4 times on average. This is because TISMM needs to generate more random matrices and involve more matrix operations. Moreover, our proposed SSMM protocol does not rely on the trusted initializer which may be difficult to find in practice, thus is more practical.

6.4 Parameter Analysis (To Q3)

(a) Effect on RMSE
(b) Effect on NDCG@10
Figure 1: Effect of on FilmTrust dataset.

Finally, we study the effect of social regularizer parameter on SeSoRec. Social recommendation can be formalized as a basic factorization model plus a social information model. The social regularizer parameter controls the contribution of social information model to the final model performance. The larger is, the more likely that the latent factors of connected users are similar, and therefore the more social information model will contribute to the overall performance. Figure 1 shows its effects on FilmTrust dataset in terms of both RMSE and NDCG@10. It can be seen that with a good choice of , SeSoRec can balance the contribution of user-item rating data on platform and user social data on platform , and thus, achieve the best performance.

7 Conclusion and Future Work

In this paper, we proposed a secret sharing based secure social recommendation framework, which can not only mine knowledge from social platform to improve the recommendation performance of the rating platform, but also keep the raw data of both platforms securely. Specifically, we first formalized secure social recommendation as a MPC problem and proposed a SEcure SOcial RECommendation (SeSoRec) framework for it. We then proposed a novel Secret Sharing based Matrix Multiplication (SSMM) algorithm to optimize it, and proved its correctness and security. Besides, we analyzed that SeSoRec has linear communication and computation complexities and thus can scale to large datasets. Experimental results on real-world datasets demonstrated that, SeSoRec achieves almost the same accuracy as the existing unsecure social recommendation model, and SSMM significantly outperforms the existing trusted initializer based secure matrix multiplication protocol. In the future, we would like to solve the potential information leakage problem of SeSoRec with better solutions.

References

Complexity Analysis of SSMM. The computational complexity mainly comes from Line 6 and 7 in Algorithm 2, which is . The communication complexity from to depends on matrices and , both of which are . The communication complexity from to depends on matrices , , and N, which are , , and , respectively, and in total.

When one of the matrices is sparse, we can slightly modify the secret sharing strategy in Algorithm 2 such that both the computational and communication complexities are reduced accordingly. Without loss of generality, we assume Q is sparse in the sense that for the rows in Q the average number of non-zero entries is . When generating , does not make it so dense as in Line 1 of Algorithm 2. The new strategy for generating is as follows:

for each row in Q do

5 Analysis

In this section, we analyze the time complexity of SeSoRec and discuss its usage and information leakage.

5.1 Complexity Analysis of SeSoRec

We first analyze the communication and computation complexities of SeSoRec, as shown in Algorithms 1. Recall that is user number, and denote the user and item numbers in the current minibatch respectively, denotes the dimension of latent factor, and is the number of ratings (data zise).

Communication Complexity. The communications come from the calculations of , , and using SSMM. First, for and , by refering to the complexity analysis of the modified SSMM, their communication costs are both for each minibatch, and are both for passing the dataset once. Seconed, for , the communication of U only needs to be done once for each data pass, and therefore, its communication cost is . To this end, the total communication costs are for passing dataset once. Since, and , the total communication cost is linear with data size.

Computation Complexity. Suppose the average number of neighbors for each user on platform is . The time complexity of lines 6 and 7 in Algorithm 2 is for each minibatch, and is for passing the dataset once. Similarly, the time complexity of the lines 3 and 4 in Algorithm 1 for passing the dataset once is . Since , the total computation cost is also linear with data size.

By applying minibatch gradient descent, the communication and computation complexities of SeSoRec are both linear with data size and thus can scale to large dataset.

5.2 Discussion

Secure common user identification. Our proposed SeSoRec assumes that platforms and have the same user set in common, so that they can proceed SSMM. The essence of secure common user identification is private set intersection (PSI). Existing work [pinkas2014faster] has provided efficient solution. PSI can be applied to identify common users on two platforms privately before adopting SeSoRec in practice, which guarantees that nothing reveals but the IDs of common users.

Information leakage. SeSoRec is asymmetric for two parties, that is, the rating platform and the social platform collaboratively conduct SSMM and return the results to . Therefore, reveals more information to . Although we have proven its security, it may still cause information leakage of when maliciously initiate SSMM iteratively. Suppose and calculate PQ using SSMM, can infer Q by varying P and fixing Q and doing this procedure with enough rounds. A naive solution is to set a constraint on Q when conducting SSMM. As long as Q (users in each minibatch) is different in each iteration, SeSoRec will have no information leakage. We leave better solutions of this as a future work. Moreover, when one matrix is sparse in SSMM and the strategies of choosing are exposed, the social platform may leak some social information to . Specifically, under this circumstance, the sparsity of the social matrix on is leaked to , however the specific social values are still protected. Therefore, it is crucial that keeps its selection of for each row of the social matrix privately.

6 Experiments

In this section, we perform experiments to answer the following question. Q1: how does SeSoRec perform comparing with the classic matrix factorization and unsecure social recommendation models, Q2: what is the performance of SSMM comparing with the existing TISMM, and Q3: how does the social parameter () affect our model performance.

6.1 Setting

We first describe the datasets, metrics, and comparison methods we use in experiments.

Datasets. We use three public real-world datasets, i.e., Epinions [massa2007trust], FilmTrust [guo2013novel] and Douban Movie [Zhong2012CAT]. All these datasets contain user-item ratings and user social (trust) information, and are widely adopted in literature. Note that although rating and social information are both available in these datasets, we realistically assume that they are located on separate platforms without any possibility of data sharing, which has no side-effect on experiments.

Dataset #user #item #rating() #social()
Epinions 8,619 5,539 229,920 232,461
FilmTrust 1,508 2,071 35,497 1,853
Douban 13,530 13,363 2,530,594 264,811
Table 1: Dataset statistics. Assuming that rating information exist on and social information are available on .

Since the original rating matrices of Epinions and Douban are too sparse, we filter out the users and items whose interactions are less than 20. Table 1 shows the statistics of these datasets after preprocessing, with which we use five-fold cross validation method to conduct experiments and evaluate model performance. That is, we split the dataset into five parts, and each time we use four parts as the training set and take the last part as test set.

Metrics. To evaluate model performance, we adopt two types of metrics, Root Mean Square Error (RMSE) and Normalized Discounted Cumulative Gain (NDCG@n), both of which are popularly used to evaluate factorization based recommendation performance in literature [koren2009matrix, he2015trirank]. RMSE is defined as

where is the predicted rating of user on item , and is the number of predictions in the test dataset . RMSE evaluates the error between real ratings and predicted ratings, with smaller values indicating better performance. NDCG@n is defined as

where is a normalizer to ensure that the perfect ranking has value 1 and is the relevance (real ratings) of item at position . NDCG evaluates the ranking performance of recommendation models, with larger values being better. We report NDCG@10 in experiments, and abbreviate it as NDCG.

Comparison methods. Our proposed SeSoRec is a novel secure social recommendation model, which is a secure version of Soreg [ma2011recommender]. We compare SeSoRec with the following latent factor models:

  • [leftmargin=*]

  • MF [mnih2007probabilistic] is a classic latent factor model, which only uses the user-item interaction information on platform . This is the situation where the social platform is reluctant to share raw social information with the rating platform .

  • Soreg [ma2011recommender] is a classic social recommendation model, which is unsecure in the sense that needs the raw data of .

Note that we do not compare with the state-of-the-art recommendation methods. The reason is: (1) most of them assume the recommendation platform has many different kinds of information such as contextual information [rendle2010factorization], which are unfair for our method to compare with, and (2) our focus is to study the difference between traditional unsecure social recommendation models and our proposed secure social recommendation model.

Dataset Metrics MF Soreg SeSoRec
Epinions RMSE 1.2687 1.1791 1.1789
NDCG 0.0363 0.0405 0.0401
FilmTrust RMSE 1.1907 1.1754 1.1752
NDCG 0.2042 0.2128 0.2124
Douban RMSE 0.7489 0.7420 0.7419
NDCG 0.0749 0.0780 0.0778
Table 2: Performance comparison on three dataset, including RMSE and NDCG.
dimension () 100 1000  10000
SSMM 0.0025 0.3246 40.744
TISMM 0.0060 0.7279 105.83
Table 3: Running time comparison of SSMM and TISMM.

Hyper-parameters. We set the latent factor dimension , batch size , and vary regularizer and learning rate to choose their best values. We also vary in to study its effects on SeSoRec. For other parameters, e.g., regularizer and learning rate , we use grid search to find their best values of each model.

6.2 Comparison Results (To Q1)

We report the comparison results on three datasets in Table 2. From it, we can observe that: (1) Soreg and SeSoRec consistently outperform MF. Moreover, we find that the sparser the dataset is, the more Soreg and SeSoRec improve MF. Take RMSE for example, SeSoRec improves MF at 7.60%, 1.3%, and 0.98% on Epinions, FilmTrust, and Douban, with their rating densities 0.48%, 1.14%, and 1.4%, respectively. The results prove that social information is indeed important to recommendation performance, especially when data is sparse. (2) Soreg and SeSoRec achieve almost the same recommendation accuracy, where the differences come from the fixed point decimal numbers in secret sharing. The result further validates the correctness of our proposed SSMM besides the theoretical proof.

6.3 Comparison between Ssmm and TISMM (To Q2)

As we described in SSMM section, existing Trusted Initializer based Secure Matrix Multiplication (TISMM) [de2017efficient] needs a trusted initializer (a trusted third party) to generate secrets before computation. Although TISMM may not be applicable in practice, we would like to compare the efficiency of our proposed SSMM with it. To this end, we randomly generate two square matrices and , where is the dimension of the square matrix. We then report the running time (in seconds) of calculating PQ using both algorithms in Table 3, where we use local area network. It can be easily seen that our proposed SSMM costs much less time than TISMM. The speedup is around 2.4 times on average. This is because TISMM needs to generate more random matrices and involve more matrix operations. Moreover, our proposed SSMM protocol does not rely on the trusted initializer which may be difficult to find in practice, thus is more practical.

6.4 Parameter Analysis (To Q3)

(a) Effect on RMSE
(b) Effect on NDCG@10
Figure 1: Effect of on FilmTrust dataset.

Finally, we study the effect of social regularizer parameter on SeSoRec. Social recommendation can be formalized as a basic factorization model plus a social information model. The social regularizer parameter controls the contribution of social information model to the final model performance. The larger is, the more likely that the latent factors of connected users are similar, and therefore the more social information model will contribute to the overall performance. Figure 1 shows its effects on FilmTrust dataset in terms of both RMSE and NDCG@10. It can be seen that with a good choice of , SeSoRec can balance the contribution of user-item rating data on platform and user social data on platform , and thus, achieve the best performance.

7 Conclusion and Future Work

In this paper, we proposed a secret sharing based secure social recommendation framework, which can not only mine knowledge from social platform to improve the recommendation performance of the rating platform, but also keep the raw data of both platforms securely. Specifically, we first formalized secure social recommendation as a MPC problem and proposed a SEcure SOcial RECommendation (SeSoRec) framework for it. We then proposed a novel Secret Sharing based Matrix Multiplication (SSMM) algorithm to optimize it, and proved its correctness and security. Besides, we analyzed that SeSoRec has linear communication and computation complexities and thus can scale to large datasets. Experimental results on real-world datasets demonstrated that, SeSoRec achieves almost the same accuracy as the existing unsecure social recommendation model, and SSMM significantly outperforms the existing trusted initializer based secure matrix multiplication protocol. In the future, we would like to solve the potential information leakage problem of SeSoRec with better solutions.

References

6 Experiments

In this section, we perform experiments to answer the following question. Q1: how does SeSoRec perform comparing with the classic matrix factorization and unsecure social recommendation models, Q2: what is the performance of SSMM comparing with the existing TISMM, and Q3: how does the social parameter () affect our model performance.

6.1 Setting

We first describe the datasets, metrics, and comparison methods we use in experiments.

Datasets. We use three public real-world datasets, i.e., Epinions [massa2007trust], FilmTrust [guo2013novel] and Douban Movie [Zhong2012CAT]. All these datasets contain user-item ratings and user social (trust) information, and are widely adopted in literature. Note that although rating and social information are both available in these datasets, we realistically assume that they are located on separate platforms without any possibility of data sharing, which has no side-effect on experiments.

Dataset #user #item #rating() #social()
Epinions 8,619 5,539 229,920 232,461
FilmTrust 1,508 2,071 35,497 1,853
Douban 13,530 13,363 2,530,594 264,811
Table 1: Dataset statistics. Assuming that rating information exist on and social information are available on .

Since the original rating matrices of Epinions and Douban are too sparse, we filter out the users and items whose interactions are less than 20. Table 1 shows the statistics of these datasets after preprocessing, with which we use five-fold cross validation method to conduct experiments and evaluate model performance. That is, we split the dataset into five parts, and each time we use four parts as the training set and take the last part as test set.

Metrics. To evaluate model performance, we adopt two types of metrics, Root Mean Square Error (RMSE) and Normalized Discounted Cumulative Gain (NDCG@n), both of which are popularly used to evaluate factorization based recommendation performance in literature [koren2009matrix, he2015trirank]. RMSE is defined as

where is the predicted rating of user on item , and is the number of predictions in the test dataset . RMSE evaluates the error between real ratings and predicted ratings, with smaller values indicating better performance. NDCG@n is defined as

where is a normalizer to ensure that the perfect ranking has value 1 and is the relevance (real ratings) of item at position . NDCG evaluates the ranking performance of recommendation models, with larger values being better. We report NDCG@10 in experiments, and abbreviate it as NDCG.

Comparison methods. Our proposed SeSoRec is a novel secure social recommendation model, which is a secure version of Soreg [ma2011recommender]. We compare SeSoRec with the following latent factor models:

  • [leftmargin=*]

  • MF [mnih2007probabilistic] is a classic latent factor model, which only uses the user-item interaction information on platform . This is the situation where the social platform is reluctant to share raw social information with the rating platform .

  • Soreg [ma2011recommender] is a classic social recommendation model, which is unsecure in the sense that needs the raw data of .

Note that we do not compare with the state-of-the-art recommendation methods. The reason is: (1) most of them assume the recommendation platform has many different kinds of information such as contextual information [rendle2010factorization], which are unfair for our method to compare with, and (2) our focus is to study the difference between traditional unsecure social recommendation models and our proposed secure social recommendation model.

Dataset Metrics MF Soreg SeSoRec
Epinions RMSE 1.2687 1.1791 1.1789
NDCG 0.0363 0.0405 0.0401
FilmTrust RMSE 1.1907 1.1754 1.1752
NDCG 0.2042 0.2128 0.2124
Douban RMSE 0.7489 0.7420 0.7419
NDCG 0.0749 0.0780 0.0778
Table 2: Performance comparison on three dataset, including RMSE and NDCG.
dimension () 100 1000  10000
SSMM 0.0025 0.3246 40.744
TISMM 0.0060 0.7279 105.83
Table 3: Running time comparison of SSMM and TISMM.

Hyper-parameters. We set the latent factor dimension , batch size , and vary regularizer and learning rate to choose their best values. We also vary in to study its effects on SeSoRec. For other parameters, e.g., regularizer and learning rate , we use grid search to find their best values of each model.

6.2 Comparison Results (To Q1)

We report the comparison results on three datasets in Table 2. From it, we can observe that: (1) Soreg and SeSoRec consistently outperform MF. Moreover, we find that the sparser the dataset is, the more Soreg and SeSoRec improve MF. Take RMSE for example, SeSoRec improves MF at 7.60%, 1.3%, and 0.98% on Epinions, FilmTrust, and Douban, with their rating densities 0.48%, 1.14%, and 1.4%, respectively. The results prove that social information is indeed important to recommendation performance, especially when data is sparse. (2) Soreg and SeSoRec achieve almost the same recommendation accuracy, where the differences come from the fixed point decimal numbers in secret sharing. The result further validates the correctness of our proposed SSMM besides the theoretical proof.

6.3 Comparison between Ssmm and TISMM (To Q2)

As we described in SSMM section, existing Trusted Initializer based Secure Matrix Multiplication (TISMM) [de2017efficient] needs a trusted initializer (a trusted third party) to generate secrets before computation. Although TISMM may not be applicable in practice, we would like to compare the efficiency of our proposed SSMM with it. To this end, we randomly generate two square matrices and , where is the dimension of the square matrix. We then report the running time (in seconds) of calculating PQ using both algorithms in Table 3, where we use local area network. It can be easily seen that our proposed SSMM costs much less time than TISMM. The speedup is around 2.4 times on average. This is because TISMM needs to generate more random matrices and involve more matrix operations. Moreover, our proposed SSMM protocol does not rely on the trusted initializer which may be difficult to find in practice, thus is more practical.

6.4 Parameter Analysis (To Q3)

(a) Effect on RMSE
(b) Effect on NDCG@10
Figure 1: Effect of on FilmTrust dataset.

Finally, we study the effect of social regularizer parameter on SeSoRec. Social recommendation can be formalized as a basic factorization model plus a social information model. The social regularizer parameter controls the contribution of social information model to the final model performance. The larger is, the more likely that the latent factors of connected users are similar, and therefore the more social information model will contribute to the overall performance. Figure 1 shows its effects on FilmTrust dataset in terms of both RMSE and NDCG@10. It can be seen that with a good choice of , SeSoRec can balance the contribution of user-item rating data on platform and user social data on platform , and thus, achieve the best performance.

7 Conclusion and Future Work

In this paper, we proposed a secret sharing based secure social recommendation framework, which can not only mine knowledge from social platform to improve the recommendation performance of the rating platform, but also keep the raw data of both platforms securely. Specifically, we first formalized secure social recommendation as a MPC problem and proposed a SEcure SOcial RECommendation (SeSoRec) framework for it. We then proposed a novel Secret Sharing based Matrix Multiplication (SSMM) algorithm to optimize it, and proved its correctness and security. Besides, we analyzed that SeSoRec has linear communication and computation complexities and thus can scale to large datasets. Experimental results on real-world datasets demonstrated that, SeSoRec achieves almost the same accuracy as the existing unsecure social recommendation model, and SSMM significantly outperforms the existing trusted initializer based secure matrix multiplication protocol. In the future, we would like to solve the potential information leakage problem of SeSoRec with better solutions.

References

7 Conclusion and Future Work

In this paper, we proposed a secret sharing based secure social recommendation framework, which can not only mine knowledge from social platform to improve the recommendation performance of the rating platform, but also keep the raw data of both platforms securely. Specifically, we first formalized secure social recommendation as a MPC problem and proposed a SEcure SOcial RECommendation (SeSoRec) framework for it. We then proposed a novel Secret Sharing based Matrix Multiplication (SSMM) algorithm to optimize it, and proved its correctness and security. Besides, we analyzed that SeSoRec has linear communication and computation complexities and thus can scale to large datasets. Experimental results on real-world datasets demonstrated that, SeSoRec achieves almost the same accuracy as the existing unsecure social recommendation model, and SSMM significantly outperforms the existing trusted initializer based secure matrix multiplication protocol. In the future, we would like to solve the potential information leakage problem of SeSoRec with better solutions.

References

References