1 Introduction
Today, personalization is widely adopted by a large number of industries, from entertainment to precision medicine. The main enabling technology is recommender systems, which employ all sorts of techniques to predict the preferences of human subjects (e.g. the likes and dislikes towards a movie). A typical system architecture is shown in Figure 1.
So far, a lot of generic recommender algorithms have been proposed, as surveyed in [24]
. Recently, deep learning has become a very powerful tool and has been used to numerous applications, including recommender
[32]. Nevertheless, the collaborative filtering recommender systems are most popular and wellknown due to their explainable nature (e.g. you like x so you may also like y). Given a user set and their rating vectors for , let denote the set of such that . One of the most popular collaborative filtering algorithms is based on lowdimensional factor models, which derive two feature matrices and from the rating matrix. The feature vector denotes user ’s interest and the feature vector denotes item ’s characteristics. Every feature vector has the dimension , which is often a much smaller integer than and . In implementations, and are often computed by minimizing the following function:(1) 
for some positive parameters
, typically through the stochastic gradient descent (SGD) method or its variants. Note that one advantage of the latent factor based collaborative filtering is its better resistance to robustness attacks than the neighbourhoodbased ones
[20].1.1 Privacy and Robustness Issues
Besides the likes and dislikes, users’preferences might lead to inferences towards other sensitive information about the individuals, e.g. the religion, political orientation, and financial status. When a user is involved in a recommender system with a pseudonym, there is the risk of reidentification. For instance, Weinsberg et al. [30] demonstrated that what has been rated by a user can potentially help an attacker identify this user. Privacy issues have been recognized for a long time and a lot of solutions have been proposed today, as surveyed in [5, 14]. Robustness is about controlling the effect of manipulated inputs, and is a fundamental issue for recommender systems. Its importance can be easily seen from the numerous scandals, including fake book recommendations ^{1}^{1}1https://tinyurl.com/y9nyo8y9, fake phone recommendations ^{2}^{2}2https://tinyurl.com/ycc8lujh and malicious medical recommendations ^{3}^{3}3https://tinyurl.com/ybuevrwq. In their seminal work, Lam and Riedl [17] investigated the concept of shilling attacks, where a malicious company lies to the recommender system (or, inject fake profiles) to have its own products recommended more often than those from its competitors. Following this, a number of works have been dedicated to the investigation of different robustness attacks and corresponding countermeasures. Interestingly, Sandvig, Mobasher, and Burke [20] empirically showed that modelbased algorithms are more robust than memorybased algorithms; Cheng and Hurley [10] proposed informed modelbased attacks against trustaware solutions, and demonstrated it against the privacypreserving solution by Canny [9].
Clearly, robustness attacks pose a threat to the business perspective of the RecSys and subsequently impact the quality of service for the users. Privacy is increasingly becoming a concern for the privacyaware users, and it is also a concern for the RecSys when it wants to deploy a machine learning as a service business model [28]. Unfortunately, privacy and robustness have a complementary yet conflicting relationship. On the complementary side, it is clear that privacy disclosure can lead to more successful robustness attacks as the attacker can adapt its attack strategy accordingly, and a robust system reduces the attack surface for the privacy attackers who injects fake profiles to infer the honest users’ information based on the received outputs. On the conflicting side, a privacypreserving recommender makes it harder to combat robustness attacks because the robustness attack detection algorithms will not work well when all users’ inputs are kept private. We elaborate on this aspect in Section 3.
1.2 Our Contribution
In this paper, we aim at a comprehensive investigation of the privacy and robustness issues for recommender systems, by considering both the model training and the prediction computing stages. To this end, we first provide a general system architecture and present a highlevel security model accordingly. We then review the existing privacypreserving latent factor based recommender solutions and identify their potential issues. Particularly, we notice that most cryptographic solutions have mainly aimed at the privacy protection for the model training stage without paying much attention to the prediction computing stage. This consequently results in serious privacy issues in practice. We also highlight that existing privacypreserving solutions make it harder to detect and prevent robustness attacks.
Towards privacypreserving solutions that respect robustness attack detection, we separately address the issues in the model training and prediction computing stages. For the former, we show that existing solutions can be adapted, particularly it is straightforward for the expertbased ones such as that from [29]. As to the latter, we propose two new cryptographic protocols, one of which involves an extra proxy. Our experimental results show that both protocols are very efficient with respect to practical datasets. The employed privacy by design approach, namely returning unrated items whose approximated predictions are above a threshold, might have profound privacy implications, nevertheless we leave a detailed investigation as future work.
1.3 Organisation
The rest of the paper is organised as follows. In Section 2, we introduce a generic recommender system architecture that consists of two stages: model training and prediction computing. Accordingly, we present a highlevel security model. In Section 3, we analyse some representative privacypreserving recommender solutions and identify their deficiencies in our security model. In Section 4, we present a solution framework to demonstrate how to construct secure recommender solutions in our security model. In Section 5, we propose a new privacypreserving protocol for prediction computing, which does not involve a thirdparty proxy. In Section 6, we propose a new privacypreserving protocol for prediction computing, which is more efficient but relies on a proxy. In Section 7, we conclude the paper.
2 System Architecture and Security Model
We assume the RecSys builds recommender models and offers recommendation as a service to the users. If some users do not care about their privacy, then they can offer their rating vectors directly to the RecSys to receive recommendations. In addition, the Recsys may collect as much nonprivate data as possible in order to build an accurate recommender model. We assume there are privacyaware users who are not willing to disclose their rating vectors while still wishing to receive recommendations. Our main objective is to design solutions to guarantee that, from the view point of a privacyaware user Alice,

She receives highquality recommendations, by avoiding the robustness attacks mentioned in Section 1.1.

She minimizes the information disclosure about her rating vector, under the prerequisite that she receives highquality recommendations.
For our recommender as a service, we assume a system architecture shown in Figure 2. We note that existing collaborative filtering recommender systems typically have the model training and prediction computing stages even though they might not mention them explicitly. In addition, it is also quite often that a proxy (i.e. cloud computing facility) is employed to carry out the massive computations (e.g. Netflix heavily uses Amazon cloud services). It is worth emphasizing that many privacypreserving solutions (particularly cryptographic solutions) also introduce such a third party, e.g. the crypto service provider in the term of [16] and [21]. For the different usage scenarios, the trust assumptions on the proxy can vary a lot, and we elaborate on it later. Next, we briefly introduce what will happen in the two stages.

In the model training stage, labeled in Figure 2, the RecSys trains a model, e.g. similarities between items (or users) in neighbourhoodbased recommenders and feature matrices for users and items in latent model based ones, based on data from one or more sources. To clean the data and detect robustness attacks, before the training, we suppose that the RecSys will run an algorithm over the training dataset. To simplify our discussion, we assume the the output of is a binary bit for every input profile (i.e. rating vector). If it is 0, then the profile is deemed as malicious so that will not be used in the training.

After training, we refer to the output of the model training stage as a set of parameters . Note that the parameters might be in an encrypted form when privacy protection has been applied. In the prediction computing stage, the RecSys uses the model parameters and possibly Alice’s rating vector to infer Alice’s preferences.
2.1 The Proposed Security Model (high level)
We make the following general assumptions related to security. First of all, we assume the communication channel is secured with respect to confidentiality and integrity in the sense: (1) an honest user can be assured that his input will reach the RecSys or another intended party without being eavesdropped on and manipulated; (2) the RecSys can be assured that the honest user, who initiates the communication, will receive the message without being eavesdropped on and manipulated. It is worth stressing that there is no guarantee whether RecSys knows the true identity of the user it is communicating with. Secondly, we assume that the RecSys is a rational player and offers recommendation as a service and a user offers monetary rewards for receiving recommendations. Without this assumption, there will not be any guarantee for achieving privacy and robustness because the RecSys will deviate from the protocol for any possible benefits.
Regarding robustness, we require that the RecSys is able to (efficiently) run any chosen algorithm over the training dataset to identify the malicious profiles, i.e. rating vectors, as we have described in the beginning of this section. The output of should be the same regardless what privacy protection mechanisms have been deployed.
Regarding privacy, we consider the following specific requirements. Note that, similar to the semantic security of encryption schemes, indistinguishabilitybased games can be defined to formally capture all requirements. We skip the details here, partially due to the fact that the cryptographic primitives we use (e.g. homomorphic encryption) guarantee indistinguishability straightforwardly.

Alice’s privacy against RecSys. If the RecSys does not collude with the proxy, then it learns nothing about Alice’s input and output except for information implied in the output of (i.e. whether or not Alice’s profile is suspicious if it has been used in the model training stage).

Alice’s privacy against Proxy. If the proxy does not collude with the RecSys, then it learns nothing about Alice’s input and output.

Alice’s privacy against other users. Other users do not learn more information about Alice’s rating vector than that implied in the legitimate outputs they receive.

RecSys’s privacy against Alice and other users. Alice and other users do not learn more information than that implied in the legitimate outputs they receive.
As a remark, in many existing solutions reviewed in Section 3.2, the legitimate outputs can contain too much private information. This has motivated our privacybydesign approach in Section 4.1. As an informal requirement, when both the RecSys and the Proxy are compromised simultaneously, the information leakage about the privacyaware users’ data should also be minimized. To this end, we note that most existing solutions except for the expertbased ones will leak everything.
3 Literature Work and Standing Challenges
Regardless efficiency, designing a secure recommender system is a very challenging task. For example, applying statistical disclosure mechanisms does not guarantee security, as Zhang et al. [31] showed how to recover perturbed ratings in the solutions by Polat and Du [23]. Employing advanced cryptographic primitives is also not a panacea, as Tang and Wang [27] pointed out a vulnerability in the homomorphic encryption based solution by Jeckmans et al. [15]. Next, we analyse some representative solutions from the literature and identify the standing challenges.
3.1 Preliminary on Building Blocks
We use the notation to denote that is chosen from the set uniformly at random. A public key encryption scheme consists of three algorithms : generates a key pair ; outputs a ciphertext ; outputs a plaintext . Some schemes, e.g. Paillier [22], are additively homomorphic, which means there is an operator such that . While some recent somewhat homomorphic encryption (SWHE) schemes are both additively and multiplicatively homomorphic to a certain number of operations, which means there are operators and such that and . In practice, one of the most widelyused SWHE library is Simple Encrypted Arithmetic Library (SEAL) from Microsoft [11], which is an optimized implementation of the YASHE scheme [6]. Note that homomorphic subtraction can be directly defined based on and with similar computational cost.
3.2 Examining some Cryptographic Solutions
Cryptographic solutions aim at minimizing the information leakage in the computation process, by treating the recommender as a largescale multiparty computation protocol. When designing privacypreserving solutions, it has become a common practice to introduce one or multiple third parties, not all of which are supposed to collude with each other, in order to eliminate a single trusted third party and improve efficiency. Nikolaenko et al. [21] and Kim et al. [16] introduced a CSP (i.e. crypto service provider) and employed garbled circuits and homomorphic encryption respectively to perform privacypreserving matrix factorization. These solutions put the emphasis on the matrix factorization step (i.e. model training stage) while failing to pay more attention to the prediction computing stage. In [21], it is proposed that every user is given his own feature vector so that it can interact with the RecSys and CSP to retrieve predictions on all items (i.e. ). In reality, the users do not need to know his feature vector and the predictions to all items, they only need to know the items they might like. In more detail, there are several concerns.

Given the fact that (i.e. the number of items are far less than the user population), a small number of colluded users can recover the item feature matrix , based on which they can try to infer information about the rest of the population. This leads to unnecessary information leakages against the honest users.

The malicious users might make illegal use of the recovered , through providing recommendation services using technologies, such as incremental matrix factorization. Besides the potential privacy concern, this may hurt the business model of the RecSys.

Privacypreserving mechanisms, such as encryption and garbled circuits, make it very difficult to detect Sybil attacks, where an attacker injects fake profiles into the system and then it can (1) try to infer private information based on the outputs to these fake profiles (2) and mount robustness attacks. Canny [8] used zeroknowledge proof technique to fight against illformed profiles (i.e. ratings set beyond ), but it is not effective against Sybil attacks. With respect to the robustness attacks in reality, the forged rating vectors are always wellformed (but the rating values in these forged rating vectors follow maliciously defined distributions), otherwise the RecSys can easily identify the illformed ones in plaintext. To detect and prevent robustness attacks, special detection algorithms need to be executed on the input rating vectors in the privacypreserving solutions.
When training a recommender model, it is unnecessary to always take into the ratings from all possible users. Amatriain et al. [2] introduced recommender system based on expert opinions, and showed that the recommendation accuracy can be reasonably good even if a target user’s data is not used in training the model. Following this concept, Ahn and Amatriain [1] proposed a privacypreserving distributed recommender system, and similar concept has been adopted in [26, 29]. The solution from [29] is very interesting because it leads to very efficient solutions. We briefly summarize it below.

In the model training stage, suppose the expert data set consists of rating vectors . The model parameters are denoted as , where A and are two independent item feature spaces, is the global rating average, and are the average rating for user and item respectively, ,
are the user and item bias vectors. Suppose user
has not rated item , his preference is formulated as follows.(2) Similar to other solutions, SGD can be used to learn the parameter .

In the prediction computing stage, suppose the user is not in the expert dataset and has the rating vector and rating average , the prediction for rating is computed as follows
(3)
According to their experimental results, the accuracy of the predictions are almost the same to the stateoftheart recommender systems even though user is not required to be involved in the model training stage. As such, Wang et al. [29] further proposed an efficient privacypreserving protocol based on Paillier encryption scheme, so that the prediction (i.e. Equation (3)) can be computed in the encrypted form. Unfortunately, their solution allows a malicious user or several of such users to straightforwardly recover , which are functionally equivalent to the model parameters , by solving some simple linear equations. This attack poses severe threats against the recommendation as a service objective and the privacy of the RecSys, claimed in [29].
3.3 Examining the DPbased Solutions
While cryptographic solutions might provide provable security for the computation, they do not consider the information leakages from the legitimate outputs. In particular, the inference against an honest user or a group of honest users might be very severe when the attacker has effectively controlled part of the population (e.g. by launching Sybil attacks). Following the seminal work of McSherry and Mironov [19], researchers have tried to apply the differential privacy concept to the prevent information leakages from recommender outputs, e.g. [4, 13, 12].
One of the main issues with DPbased approach is how to set the privacy parameter . Specific to recommender systems, it is unrealistic to predefine a privacy budget, because the recommender algorithm (i.e. model training stage) will be executed hundreds, thousands or more times. With respect to the sequential composition theorem, the privacy guarantee becomes after executions of the recommender algorithm. In this case, to maintain a meaningful level of privacy protection, the privacy parameter in every execution needs to be so small such that the recommendation accuracy will be totally destroyed. Besides, most DPsolutions assume a trusted curator (e.g. RecSys), which means there is no privacy against this party. In other solutions (e.g. local differential privacy[25]
) no trusted curator is required but it will severely interfere with robustness attack detection operations. For example, a privacyaware user who prefers a higher level of privacy protection might be prone to be classified as malicious due to the extensive perturbation of his rating vector.
4 Modular Solution Constructions
In this section, we present modular solutions which secure both the model training and prediction computing stages. We first introduce a privacybydesign concept to minimize information leakages from the outputs, and then describe two types of constructions. In one type of construction, the RecSys trains its recommender model without relying on privacyaware users’ data, while in the other the RecSys needs privacyaware users’ data to train the model so that these users can receive meaningful recommendations. For notation purpose, we refer to them as Expertbased Solution and Selfbased Solution respectively.
Note that for the constructions, we leave the detailed description of privacypreserving protocols for the prediction computing stage to Section 5 and 6.
4.1 Privacy by Design Concept
In Section 3.2, we have shown that the legitimate outputs in the solutions from [21, 29] contain a lot of unnecessary information and can leak the recommender model to a small group of malicious users. To avoid such problems, we enforce the privacy by design concept by restricting the output to any user to be the unrated items whose predictions are above a threshold in the proposed prediction computing stage. This significantly reduces the leakage of to the user and also allow more efficient protocol design. In reality, the predictions to many items can be quite close so that it is very subtle to only return Topk (say k=20) items. For example, for MovieLens 1M Dataset with 1 million ratings from 6000 users on 4000 movies^{4}^{4}4https://grouplens.org/datasets/movielens/1m/, the distribution of predictions is shown in Figure 3, where the horizontal Axis stands for the prediction value and vertical Axis stands for the number of predictions that possess the value. Note that all predicted ratings have been rounded to have one decimal place. Intuitively, as an example, it makes more sense to return the unrated items whose ratings are 4.9 or 5. Put it another way, we only need to return the items whose predicted ratings fall into the set .
4.2 Privacypreserving and Robust Expertbased Solution
In this solution, we adopt the recommender algorithm [29], which has the nice property that the privacyaware user Alice does not need to share her rating vector with the RecSys to train the recommender model and the process of model training is very simple. Note that in some other expertbased recommender systems, Alice’s data may not be needed to train the model but the process of model training will be much more complex (i.e. often retraining the recommender model is required before being able to generate recommendations for Alice).

In the solution, the model training stage is very straightforward. Given an expert dataset, the RecSys can first run any robustness attack detection algorithm
to figure out the outliers or even malicious profiles. Then, the RecSys can learn the model parameters
from the expert dataset, which is publicly available to the RecSys. More information can be seen from Section 3.2. 
Let’s assume that Alice is labelled as user in the privacyaware user group, the prediction computing stage consists of the following steps.

User generates a public/private key pair for an SWHE scheme, and shares the public key with RecSys.

User sends and to the RecSys, which may require the user to prove that the encrypted is well formed similar to what has been done in [8].

If everything is ok, the RecSys can predict user ’s preference on item as
(4)

4.3 Privacypreserving and Robust Selfbased Solution

In the model training stage, we need to augment existing privacypreserving protocols for the model training stage, e.g. those from [21] and [16], to enable privacypreserving robustness attack detection.

In case of [21], we need to devise a larger garbled circuit, which first evaluates and then chooses the unsuspicious inputs to proceed with the matrix factorization procedure.

In case of [16], we need to devise a cryptographic protocol that can evaluate algorithm on the same encrypted inputs to those used in the HEbased matrix factorization algorithm.
A seamless augmentation will depend on the specific robustness attack detection algorithms, so that in this paper we skip the details, which can be an interesting future though.


At the end of the privacypreserving matrix factorization, either from [21] or [16], the RecSys will possess and , where is an SWHE public/private key pair from the CSP (or Proxy in our system structure). The participants (i.e. user , RecSys, and Proxy) then perform the following steps.

The RecSys computes user ’s preference on item as for every .

For every , the RecSys selects a random number , then computes and sends to the proxy. User generates a Paillier public/private key pair and send to the Proxy.

The Proxy decrypts and reencrypts the plaintext to obtain .

For every , the RecSys removes from to obtain .

It is clear that the model training stage of our expertbased solution satisfies all our robustness and privacy expectations, while the privacy analysis depends on the protocols from Section 5 and 6 because the existing steps do not leak information due to the encrypted operations and randomization. For the selfbased solution, we can guarantee the same level of privacy and robustness protection, although it will apparently be less efficient than the previous expertbased one.
5 Privacypreserving Prediction Computing
In this section, we describe a privacypreserving protocol for user to learn the unrated items whose predictions fall into a set , without relying on a proxy. Here will be a small integer, which may be 2 or 3 in practice referring to the example in the previous section. Observing that privacypreserving protocols for the model training stage often output integer predictions (in encrypted form), because they need to scale the intermediary computation results in order to be compatible with the cryptographic tools such as homomorphic encryption algorithms. Therefore, we assume the RecSys possesses the encrypted predictions for every at the end of the privacypreserving model training stage. We explicitly present the ratings according to a unit , because in our protocol the recommendations will only be based on the part and the part is rounding off.
5.1 Description of the Proposed Protocol
At the beginning of the prediction computing stage, we suppose user possesses two public/private key pairs: one is for the Paillier scheme which has been setup in Section 4.2 and 4.3 while the other is new key pair for a SWHE encryption scheme [18]. The public keys are shared with the RecSys. As shown in Figure 4, the protocol runs in two phases where is the security parameter.
In the reduction phase, the RecSys and user round off the part in the encrypted predictions. Specifically, for every , the following operations will be carried out.

The RecSys first randomizes and to generates for user .

Then, user obtains the randomized prediction value through decryption and then computes , which is the randomized in an approximation form with . Finally, user encrypts under his own SWHE public key if item is unrated, and encrypts a random value otherwise.

After receiving , the RecSys homomorphically removes the randomization noise to obtain , which is a ciphertext for if item is unrated and a ciphertext for a random value otherwise.
In the Evaluation phase, for every , the RecSys computes through homomorphic subtractions and multiplications, which is a ciphertext for if the plaintext corresponding to falls into and a ciphertext for a nonzero value otherwise. In order to hide the nonzero values, the RecSys randomize via the function, e.g. homomorphicly multipling a random number, to obtain , which can be decrypted by user to learn the index of recommended items.
5.2 Security and Performance Analysis
The operations in the protocol are done with encrypted data and randomization has been applied to the predictions revealed to user . As such the protocol only reveals the desired items to user while leaks nothing to the RecSys.
For Paillier, we set the size of to be 2048, and for SWHE we use Microsoft SEAL library. We select the ciphertext modulus , the polynomial modulus . Using Chinese Reminder Theorem, we select two 40bit primes to represent the plaintext space of . The primes are 1099511922689 and 1099512004609. By packing 8192 plaintexts into one ciphertext, we can process 8192 multiplications in one homomorphic multiplication. Based on an Intel(R) Core(TM) i75600U CPU 2.60GHz, 8GB RAM, the timing is summarized in Table 1.
31.30 ms  12.88 ms  8.50 s  52.43 ms 
partial  
39.63 ms  207.76 ms  70.28 ms  742 s 
The number of different cryptographic operations for the proposed protocol are summarized in Table 2
. In the last column, we estimate the realworld running time based on the aforementioned benchmarking results, where
by assuming the MovieLens 1M Dataset and . Note that this dataset has been used in Section 4.1.partial  Time  
User  420 s  
RecSys  998 s 
With respect to the MovieLens 1M Dataset, we consider the standard case where user and the RecSys interactively rank the predictions and the RecSys returns the topranked items. In order to rank, user and RecSys need to perform a comparison for two predictions for the RecSys to learn the order of them. Based on the same computer as above, for a comparison with the protocol from [7], the computation time for user and the Recsys is 175.88 ms and 184.60 ms respectively. Suppose we adopt a standard sorting algorithm to realise the ranking, and the average computation time for the user and RecSys will be 8442.24 s and 8860.80 s, respectively. The time delay due to the communication is about 4800 s, by assuming each computation takes up to 100 ms as in [7]. It is clear that our protocol is much more efficient.
6 Privacypreserving Prediction Computing with Proxy
In this section, we describe the protocol that relies on a proxy, and also provide corresponding analysis.
6.1 Description of the Proposed Protocol
To enable the new protocol, we make use of a keyhomomorphic pseudorandom function [3]. Given and , anybody can compute . We describe the two phases in Figures 5 and 6, respectively. As before, is the security parameter.
Similar to the case shown in Figure 4, in the reduction phase, the RecSys and user interactively round off the part in the predictions for every . The main difference (and simplification) is that, at the end of the protocol, the RecSys possesses if item has been rated and otherwise, while user possesses the random number .
The evaluation phase, shown in Figure 6, proceeds as follows.

User first establishes random messages , random permutation functions and , and a hash function with the RecSys. Given a vector of elements, randomly permutes the order of the elements. Similarly, given a vector of elements, randomly permutes the order of the elements.

User chooses random keys for and evaluates for with the key to obtain , for every . At the same time, the RecSys evaluates for with the key to obtain , for every .

After receiving the permuted values from user and the RecSys, the proxy computes
where is performed element wise. It is easy to check that if the item is unrated

User first computes for every , and then computes a randomized check value vector for every item . It permutes a vector, formed by individual check value vectors of all items, and sends the result to the RecSys.

After receiving from the user, the proxy can compute , which is a new set generated based on : for every element in , if its hash value with respect to appears in the corresponding element in then the corresponding element in is set to be 1 otherwise it is set to be 0.

With and , user can identify the unrated items whose approximated predictions fall into the set .
6.2 Security and Performance Analysis
With encryption, the reduction phase leaks no information to either party. In the evaluation phase, the Recsys does not learn anything because it receives nothing from others, while user only learns which items are recommended. Regarding the information leakage to the proxy, we only need to discuss an item for any , because different and are used for different items. For any , due to the fact that are chosen independently and at random, the are random values in the view of the proxy. With being modelled as a random oracle, leaks no information if item has been rated, and it only tells whether is a match and nothing else. The random permutations hides which items have been recommended to the user , while hide the predicted rating values for the recommended items. With respect to the security model from Section 2.1, the solution leaks the number of recommended items to the proxy, while in the security model it is required that there should be no leakage. To reduce the leakage, we can replace Step 46 with a privacypreserving set interaction protocol. We leave a detailed investigation of this issue as a future work.
We summarize the asymptotic complexity in Table 3. Based on the reference codes by the authors of [3] ^{5}^{5}5https://github.com/cpeikert/Lol/tree/master/lolapps, the and takes about 1.04 ms and 10 s. W.r.t. the MovieLens 1M Dataset and , we compute the realworld running time and put it in the last column of Table 3. It is clear that the existence of Proxy greatly improves the efficiency without seriously downgrading the privacy guarantee.
Time  
User  63.52 s  
RecSys  4.16 s  
Proxy  40 ms 
7 Conclusion
In this paper, we have demonstrated how to construct privacypreserving collaborative filtering recommenders by separately addressing the privacy issues in the model training and prediction computation stages. We argued that the expertbased approach (e.g. [29]) provides more scalable solution to the model training stage, while the efficiency of existing cryptographic solutions (e.g. [21] and [16]) remains as a challenge particularly with the need to support robustness attack detection. By leveraging homomorphic encryption and keyhomomorphic pseudorandom functions, we show that the proposed privacypreserving prediction computing protocols are much more efficient than standard solutions. The current paper leaves several interesting research questions. One is to investigate the performances of cryptographic solutions when they are extended to support robustness attack detection and also improve their efficiency. Another research question is to formally study the privacy advantage of the privacy by design approach in providing recommendations to end users, and potentially link it to differential privacy. Yet another research question is to investigate the performances (e.g. recommendation accuracy) of the two privacypreserving protocols for the prediction computing stage based on other widelyused datasets such as Netflix.
Acknowledgement
This work is partially funded by the European Unions Horizon 2020 SPARTA project, under grant agreement No 830892. The author would like to thank his former colleague Jun Wang for producing Figure 3 and his current colleague Bowen Liu for running the experiment in Section 6.2.
References
 [1] (2010) Towards fully distributed and privacypreserving recommendations via expert collaborative filtering and restful linked data. In 2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010, pp. 66–73. Cited by: §3.2.
 [2] (2009) The wisdom of the few: a collaborative filtering approach based on expert opinions from the web. In Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 532–539. Cited by: §3.2.
 [3] (2014) New and improved keyhomomorphic pseudorandom functions. In Advances in Cryptology  CRYPTO 2014, pp. 353–370. Cited by: §6.1, §6.2.
 [4] (2015) Applying differential privacy to matrix factorization. In Proceedings of the 9th ACM Conference on Recommender Systems, pp. 107–114. Cited by: §3.3.
 [5] (2013) Social media retrieval. pp. 263–281. Cited by: §1.1.
 [6] (2013) Improved security for a ringbased fully homomorphic encryption scheme. In Cryptography and Coding – 14th IMA International Conference, pp. 45–64. Cited by: §3.1.
 [7] (2015) Machine learning classification over encrypted data. In 22nd Annual Network and Distributed System Security Symposium, NDSS 2015, Cited by: §5.2.
 [8] (2002) Collaborative filtering with privacy. In IEEE Symposium on Security and Privacy, pp. 45–57. Cited by: 3rd item, item 2b.
 [9] (2002) Collaborative filtering with privacy via factor analysis. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 238–245. Cited by: §1.1.

[10]
(2009)
Trading robustness for privacy in decentralized recommender systems.
In
Proceedings of the TwentyFirst Conference on Innovative Applications of Artificial Intelligence
, pp. 3–15. Cited by: §1.1.  [11] (2017) Manual for using homomorphic encryption for bioinformatics. Proceedings of the IEEE 105 (3), pp. 552–567. Cited by: §3.1.
 [12] (2006) Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, pp. 265–284. Cited by: §3.3.
 [13] (2006) Differential privacy. In Automata, Languages and Programming, 33rd International Colloquium, ICALP 2006, M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener (Eds.), LNCS, Vol. 4052, pp. 1–12. Cited by: §3.3.
 [14] (2015) Recommender systems handbook. pp. 649–688. Cited by: §1.1.
 [15] (2013) Efficient privacyenhanced familiaritybased recommender system. In Computer Security  ESORICS 2013, pp. 400–417. Cited by: §3.
 [16] (2016) Efficient privacypreserving matrix factorization via fully homomorphic encryption: extended abstract. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pp. 617–628. Cited by: §2, §3.2, item 1, 2nd item, item 2, §4.3, §7.
 [17] (2004) Shilling recommender systems for fun and profit. In Proceedings of the 13th International Conference on World Wide Web, pp. 393–402. Cited by: §1.1.
 [18] (2016) Note: https://sealcrypto.codeplex.com/ Cited by: §5.1.
 [19] (2009) Differentially private recommender systems: building privacy into the Netflix prize contenders. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 627–636. Cited by: §3.3.
 [20] (2006) Modelbased collaborative filtering as a defense against profile injection attacks. In Proceedings of the 21st National Conference on Artificial Intelligence  Volume 2, pp. 1388–1393. Cited by: §1.1, §1.
 [21] (2013) Privacypreserving matrix factorization. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 801–812. Cited by: §2, §3.2, item 1, 1st item, item 2, §4.1, §4.3, §7.
 [22] (1999) Publickey cryptosystems based on composite degree residuosity classes. In Advances in Cryptology  EUROCRYPT 1999, pp. 223–238. Cited by: §3.1.
 [23] (2003) Privacypreserving collaborative filtering using randomized perturbation techniques. In Proceedings of ICDM 2003, pp. 625–628. Cited by: §3.
 [24] (2011) Evaluating recommendation systems. In Recommender Systems Handbook, pp. 257–297. Cited by: §1.
 [25] (2016) EpicRec: towards practical differentially private framework for personalized recommendation. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 180–191. Cited by: §3.3.
 [26] (2017) Privacypreserving hybrid recommender system. In The Fifth International Workshop on Security in Cloud Computing (SCC), pp. 59–66. Cited by: §3.2.
 [27] (2015) Privacypreserving contextaware recommender systems: analysis and new solutions. In Computer Security  ESORICS 2015, pp. 101–119. Cited by: §3.
 [28] (2016) Stealing machine learning models via prediction apis. In Proceedings of the 25th USENIX Conference on Security Symposium, pp. 601–618. Cited by: §1.1.
 [29] (2019) Novel collaborative filtering recommender friendly to privacy protection. Cited by: §1.2, §3.2, §4.1, §4.2, §7.
 [30] (2012) BlurMe: inferring and obfuscating user gender based on ratings. In Sixth ACM Conference on Recommender Systems, pp. 195–202. Cited by: §1.1.
 [31] (2006) Deriving private information from randomly perturbed ratings. In Proceedings of the Sixth SIAM International Conference on Data Mining, pp. 59–69. Cited by: §3.
 [32] (2017) Deep learning based recommender system: a survey and new perspectives. Note: https://arxiv.org/abs/1707.07435 Cited by: §1.