How Much Are You Willing to Share? A "Poker-Styled" Selective Privacy Preserving Framework for Recommender Systems

06/04/2018 ∙ by Manoj Reddy Dareddy, et al. ∙ 0

Most industrial recommender systems rely on the popular collaborative filtering (CF) technique for providing personalized recommendations to its users. However, the very nature of CF is adversarial to the idea of user privacy, because users need to share their preferences with others in order to be grouped with like-minded people and receive accurate recommendations. While previous privacy preserving approaches have been successful inasmuch as they concealed user preference information to some extent from a centralized recommender system, they have also, nevertheless, incurred significant trade-offs in terms of privacy, scalability, and accuracy. They are also vulnerable to privacy breaches by malicious actors. In light of these observations, we propose a novel selective privacy preserving (SP2) paradigm that allows users to custom define the scope and extent of their individual privacies, by marking their personal ratings as either public (which can be shared) or private (which are never shared and stored only on the user device). Our SP2 framework works in two steps: (i) First, it builds an initial recommendation model based on the sum of all public ratings that have been shared by users and (ii) then, this public model is fine-tuned on each user's device based on the user private ratings, thus eventually learning a more accurate model. Furthermore, in this work, we introduce three different algorithms for implementing an end-to-end SP2 framework that can scale effectively from thousands to hundreds of millions of items. Our user survey shows that an overwhelming fraction of users are likely to rate much more items to improve the overall recommendations when they can control what ratings will be publicly shared with others.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Collaborative filtering (CF) based recommender systems are ubiquitously used across a wide spectrum of online applications ranging from e-commerce (e.g. Amazon) to recreation (e.g. Spotify, Netflix, Hulu, etc.) for delivering a personalized user experience (Mishra and Reddy, 2016)

. CF techniques are broadly classified into two types – (i) classic

Nearest Neighbor based algorithms (Takács et al., 2008) and more recent matrix factorization techniques (Koren et al., 2009), of which the latter has been more widely and predominantly adopted in industrial applications (Das et al., 2017) for building large-scale recommender models due to its superiority in terms of accuracy (Koren et al., 2009) and massive scalability (Oh et al., 2015; Karydi and Margaritis, 2016; Schelter et al., 2013; Meng et al., 2016; Xing, 2015; Li et al., 2016). Regardless of the underlying technique, the performance of a CF system is generally driven by the “homophilous diffusion” (Canny, 2002) process, where users must share some of their preferences in order to identify others with similar tastes and get good recommendations from them. The performance of CF algorithms often deteriorates without such adequate information, as often observed in the classic cold start (Volkovs et al., 2017) problem.

This inherent need for a user to share his/her preferences sometimes leads to serious privacy concerns. To make things more complicated, privacy is not a static concept and may greatly vary across different users, items and places. For example, different users under changing geopolitical, social and religious influences may have varying degree of reservation about explicitly sharing their ratings on sensitive items that deal with subjects like politics, religion, sexual orientation, alcoholism, substance abuse, adultery, etc. (Chow et al., 2012). Overall, these privacy concerns can prevent a user from explicitly rating many items, which reduces the overall performance of a CF algorithm, as compared to an ideal scenario, where everyone freely rates all the items they consume.

1.1. Motivation

In this paper, we explore the idea of letting each user define his/her own privacy. In other words, here the user decides which ratings he/she can comfortably share publicly with others, while his/her remaining ratings are considered as private, which means that they are stored only on the user’s device locally and are never shared with anyone including any peers or a centralized recommender system. Thus, this scheme enables each user to selectively define his/her own privacy. Figure 1 shows an example of such an operational setup.

Figure 1. Working of a selective privacy preserving (SP2) framework from a user’s perspective.

In this paper, we attempt to build a CF framework that preserves each user’s selective privacy and investigates the following issues in enabling such a framework:

How can we build a selective privacy preserving (SP2) CF model that assimilates information from two kinds of ratings – all users’ public ratings and each user’s on-device private ratings?

How can we ensure that there is no loss of private information in our SP2 framework?

Can the SP2 framework improve the performance of a CF algorithm? In other words, does the SP2 framework improve the overall recommendation quality at all by taking into account each user’s private ratings? Or should the users simply hold back from rating sensitive materials if they have any privacy concern?

Can this SP2 CF model ensure scalability with respect to industrial-scale datasets?

Interestingly, the selective privacy preserving framework proposed in this paper is somewhat analogous to the rules of a classic poker game111‘em (Omaha hold ‘em), where each player tries to form a best hand combining some of the community cards (which are publicly visible to everyone) and some of the hole cards (which are privately dealt to each player).

1.2. Contributions

In the rest of this paper, we address the questions listed in Section 1.1 and make the following contributions:

We mathematically formulate the selective privacy preservation problem and present a formal framework to study it (Section 2). To the best of our knowledge, this is the first work under the umbrella of

federated machine learning

(Konećný et al., 2016) that supports a private on-device recommendation model for CF algorithms.

We propose three different strategies (Section 3) for efficiently implementing an end-to-end SP2 framework, each of which is conducive to different situations. These underlying techniques overall ensure that a SP2 CF model incurs only a reasonable cost in terms of storage and communication overhead, even when dealing with massive industrial datasets or large machine learning models.

We present analytical results on two real datasets comparing different privacy preserving and data obfuscation techniques to show the effectiveness of our SP2 framework (Section 4). We also empirically study what is a good information sharing strategy for any user in a SP2 framework and how much are the recommendations of a user affected, when he/she refrains from rating an item, instead of marking the latter as private.

We present the results of a pilot study (Section 5), which demonstrates that an overwhelming majority of participants are willing to adopt this technology in order to receive more relevant recommendations without sacrificing their individual privacies.

2. SP2 Architecture

Our proposed selective privacy preserving (SP2) framework for CF algorithms is broadly based on the popular matrix factorization (MF) method, mainly due to its better performance, scalability and industrial applicability (Koren et al., 2009; Karydi and Margaritis, 2016; Schelter et al., 2013; Meng et al., 2016; Das et al., 2017). However, some of our discussions can also be extended to the traditional nearest neighbor based CF algorithms (Takács et al., 2008). We next briefly review the MF technique in Section 2.1.

2.1. Background

In the classic biased MF model (Koren et al., 2009), we try to learn the latent user and item factors (assumed to be in the same feature space of dimension ) from an incomplete ratings matrix (Takács et al., 2008)

. More formally, here, the estimated rating for a user

on item , is given by equation (1). The corresponding symbol definitions are provided in Table 1. We compute the user and item latent factors by minimizing the regularized squared error over all the known ratings, as shown in (2). This is done either using classic Alternating Least Squares method (Takács and Tikk, 2012; Das et al., 2017; Meng et al., 2016)

which computes closed form solutions or via Stochastic Gradient Descent (SGD)

(Koren et al., 2009), which enjoys strong convergence guarantees (Ge et al., 2016; Lee et al., 2016) and many desirable properties for scalability (Oh et al., 2015; Keuper and Pfreundt, 2015). The variable update equations for SGD are given by equation (3). For simplicity, we assume from now on that the user and item factors contain the respective biases i.e. user factor for (

) implies the column vector

and item factor for () refers to the column vector .

Symbol Definition Symbol Definition
global mean of ratings set of observed ratings
bias for user latent vector for user
bias for item latent vector for item
Learning rate Regularization parameter
actual rating of by prediction of ’s rating for
calculated as (-)
Table 1. Definitions of symbols used in (1) - (3)

2.2. Problem Formulation

In a SP2 framework, each user has a set of public ratings, denoted by and a set of private ratings, denoted by . However, since is known only to , the set of ratings observed here by the central recommender system is . We denote the latter by the notation . Now, our problem can be formulated as a multi-objective optimization problem, where we attempt to minimize

regularized L2 loss functions together for

users, as shown below:

min , where L2 loss () for user is given by,

Note, traditionally multi-objective optimization problems are solved with classic techniques like linear scalarization (also known as the weighted sum method (Grodzevich and Romanko, 2006)). In fact, if we assign equal weights to each user’s L2 loss function, then linear scalarization (Grodzevich and Romanko, 2006)

can reduce this problem into a single-objective mathematical optimization problem (constructed as the weighted sum of the individual objective functions), which is similar to the one discussed in Section

2.1. However, due to privacy considerations, all of the data (users’ ratings) cannot be pooled together; this makes classic solutions to multi-objective optimizations problems inapplicable here. We next outline a privacy-aware model to solve this problem.

2.3. Model

We posit the following assumptions before summarizing our model.
Assumption 1. The central recommender system is semi-adversarial in nature i.e. it logs any information requested by a user and can later utilize it to guess what the user has rated privately.
Assumption 2. The central recommender system is not malicious in nature i.e. it will not deliberately send incorrect information to a user to adversely impact his/her recommendations. It has an incentive to provide high quality recommendations to the users.
Framework. Based on the earlier discussions, we now outline the working of our SP2 framework:

The central recommender system first builds a public model based on all the users’ shared public ratings using SGD. We obtain the public

user and item factors when the error converges after a certain number of epochs.

Each user then downloads his/her corresponding public user factor from the central recommender system.

Additionally, all users’ also download common auxiliary public model data on their devices. This data is same for all users, and hence can be broadcasted by the central recommender system (for authentication in case the server cannot be trusted).

Once the auxiliary public model data and public user factor is locally available on the device, local updates are performed on the public user factor using auxiliary model information and the private ratings, which the user has saved on the device and has not shared with anyone.

The final private user factor and the private model are stored on the user’s device and never shared or communicated.

Figure 2. Architecture of a selective privacy preserving (SP2) framework.

Figure 2 presents the overall architecture. Interestingly, in our framework users never upload/communicate any private rating, even in encrypted format, thus guaranteeing privacy preservation. This is notably different from the general federated machine learning philosophy (Konećný et al., 2016; Bonawitz et al., 2017). We elaborate the need for this difference in Section 6.

Auxiliary public model data. It is also important to note that the central recommender system cannot share many parts of the public model with all the users. In many systems like Netflix, users only consent to rate a video on the condition that the central recommender system displays only the average rating for the video, instead of the individual ratings from the users’ community. In such cases, the auxiliary public model data cannot contain ratings from , as the latter scenario also constitutes a user-privacy breach, since the users may not be comfortable sharing their explicit ratings information with each other. In the same vein, consider the example, where the auxiliary public model data comprises of both public user factors and item factors. This information alone is sufficient to identify the corresponding public ratings for other users with reasonable confidence, thus again breaching user privacy. Furthermore, even anonymizing this information is not enough to prevent privacy leaks as demonstrated through de-anonymization attacks on Netflix dataset (Narayanan and Shmatikov, 2008). Thus, the auxiliary public model data needs to be designed carefully so that it not only facilitates in building a better private model on the user’s device, but also simultaneously safeguards the SP2 framework from privacy breaches. In light of this, observe that the auxiliary public model data can comprise of public item factors alone. Each public item factor is updated over a set of users based on their public user factors and ratings. Thus, only the set of final public item factors alone do not constitute a user-privacy breach.

Private ratings distribution. For analyzing the efficacy of our SP2 framework, it is also important to consider how users privately rate an item. We examine two different hypotheses for modeling this:

Hypothesis 1 (H1). Users always decide independently which of his/her ratings are private. Formally, for any two users and , who have rated an item with ratings and respectively, is private is private is private.

Hypothesis 2 (H2). Users do not decide independently which of his/her ratings are private. In other words, ratings for some items are more likely to be marked as private. Formally, using the same mathematical notations as above, is private is private is private.

In Section 4, we further discuss how private ratings are allocated in our experiments based on these two hypotheses.

3. Implementation

In this section, we present three approaches for implementing our SP2 framework.

3.1. Naive Approach

In this approach, the auxiliary public model data contains the entire item factor matrix (i.e. all the latent item vectors and their biases). Each user’s on-device private model is then built following the steps shown in algorithm 1. The update equation used in this algorithm are similar to the ones used in the MF model in Section 2.1.

1: learning rate , reg. parameter , number of epochs, Aux. public model data containing all latent item vectors , item biases and global ratings mean , public user latent vector for , public user bias for , private ratings by
2: private user latent vector for , private user bias for ,
3:procedure ()
5:     for  do
6:         for all  do
8:              -
13:         end for      
14:     end for
Algorithm 1 Naive method to build on-device private model

3.1.1. Top- recommendation

Once the private model is built for user , we can locally predict the rating for any item, as shown in equation (1), using , since are known for all the items as part of the auxiliary public model data. These predictions can be ranked locally on the user device to provide the top- recommendations.

3.1.2. Privacy Consideration

It is important to highlight some privacy considerations behind our naive approach:

Even though a user only needs the corresponding item factors for each of the privately rated item to compute the on-device private model, the user cannot simply fetch only the desired item factors from the central recommender system in order to ensure privacy.

Consider an alternative scenario, where a user downloads only some additional irrelevant item factors to obfuscate the private user information. This would require downloading significantly fewer number of item factors, as compared to downloading the entire item factor matrix. However, this would make top- computation infeasible locally. Now, the user needs to send back to the server, which would allow the server to guess user’s private ratings. Similarly, sending a randomly perturbed private user factor back to the server can obfuscate the private information, but will degrade the quality of top- recommendations.

Consider another alternative strategy, where the actual private user factor is sent along with multiple () fake user factors, thereby obfuscating the private information and making it -anonymous(McSherry and Mironov, 2009). However, upload speeds are considerably lower than download speeds. In addition, the overall computation and communication costs can also increase by orders of magnitude, as the central servers need to compute multiple top- recommendation lists for every user and then send all of them back.

It is important to note that the item factors matrix is downloaded only once during model building. In some situation, this does not involve unreasonable communication or storage overhead from the user end. For example, the total size (in MB) of all the item factors of dimension is given by , where each item factor is assumed to be an array of type double. Assuming , the download sizes for all the item factors (in raw uncompressed format) for real datasets like MovieLens (Wu et al., 2016) and Netflix (Wu et al., 2016) are 4MB and 10MB respectively. However, for large industrial datasets (like Amazon (McAuley et al., 2015)) with close to 1 million items, the raw size of all item factors (of dimension 100) grows linearly to around 763MB.

3.2. Clustering

We propose this method to ensure scalability of the SP2 framework as the number of items become large. The intuition behind this approach is that the public auxiliary model data should consist of some approximate item factors , which is much smaller than the set of all item factors i.e. . Now, each user for a private rating should use the approximate item factor , instead of the actual item factor to compute the private model. This approximation introduces an error in calculation for each private rating and is given by , where is the private user factor for and . Now, for each user , we should minimize these approximation errors across all his/her private ratings i.e. minimize , or . Since, the central recommender system does not know any for any user, the former prepares the public auxiliary model data by minimizing the approximation errors across all item factors i.e. minimize . This minimization goal is similar to the objective function used in clustering (Kanungo et al., 2002). Thus, the central recommender system performs this approximation through clustering, particularly using -means clustering with Euclidean distance (Arthur and Vassilvitskii, 2007). The individual cluster mean is treated as the approximate item factor for all the items in the cluster. In summary, the public auxiliary model data for this method comprises of (1) cluster centroids obtained after applying the -means algorithm on all the item factors, (2) cluster membership information, which identifies which cluster an item belongs to and (3) global ratings average. Using this public auxiliary model data, algorithm 2 computes the on-device private model for each user.

1: learning rate , regularization parameter , number of epochs, Aux. public model data containing all cluster centers having latent vectors , biases and global ratings mean , Cluster membership function, where item is mapped to cluster , public user latent vector for , public user bias for , private ratings by
2: private user latent vector for , private user bias for ,
3:procedure ()
5:     for each cluster c do
6:          Calculate no. of items in from      
7:     end for
8:     for  do
9:         for all  in  do
11:              -
16:         end for      
17:     end for
Algorithm 2 Building on-device private model via clustering

Note, the cluster membership information for a set of items would require bytes, assuming each cluster id is an integer which takes 4 bytes. For clusters, this membership information size can be further reduced drastically using bloom filters (Bloom, 1970; Marais and Bharat, 1997) where each bloom filter represents a cluster.

3.2.1. Top- recommendation

In this method recall that all the item factors are not available locally on the user device. Therefore, we pursue a different strategy here: user requests the public item factors for top- recommended items from the central recommender system. The latter computes this using public user factor and then sends the top items and their corresponding public item factors to . can re-rank these items based on his/her private user factor and then select the top-. Note, this top- computation by the central servers is not a privacy threat, as it can be easily calculated without any information about user’s private ratings. Also, recall our assumption 2 in Section 2.3, which ensures that incorrect top- information will not be sent by the central servers.

3.3. Joint Optimization

Our previous approach was based on hard assignment, where each item was assigned to only one cluster. However, soft clustering techniques like non-negative matrix factorization (NMF) (Lee and Seung, 2000) considers each point as a weighted sum of different cluster centers. In this approach, we try to perform soft clustering on all the item factors simultaneously as the public recommendation model is built. In other words, the central recommender system jointly learns the public model and the soft cluster assignments. For this, we revise the equations (1) and (2) to (4) and (5), where denotes the cluster center matrix of dimension ( being the number of clusters), and is a column vector representing the different cluster weights (non-negative) for item . This problem can be formulated as a constrained optimization problem and algorithm 3 shows how the central recommender system performs this joint optimization. One key aspect in this algorithm is that the weights are updated (step 14) using projected gradient descent (PGD) (Lin, 2007), in order to ensure that all cluster weights are non-negative. This facilitates in finding the top- cluster assignments for any item by finding the highest corresponding weights. Finally, the auxiliary model data for this approach should consist of the following: (1) the cluster center matrix , (2) item biases , (3) top- cluster weights (in descending order) for each item , the corresponding cluster ids and (4) the global ratings mean. Using and top- cluster weights for any item , user can locally approximate the public item factor for any item by its weighted sum of top- cluster centers i.e. ( represents the cluster center). With this approximation, can now use algorithm 1 to compute the on-device private model again. Note, when is small, we can save a significant communication cost by sending only top- weights as compared to the naive approach.

1: learning rate , regularization parameter , number of epochs,
2: for all users and items
3:procedure ()
5:     Initialize with values from .
6:     for  do
7:         for all  in  do
9:              -
15:              for each  do
16:                   //PGD               
17:              end for          
18:         end for      
19:     end for
Algorithm 3 Joint optimization based matrix factorization

3.3.1. Top- recommendation

Interestingly, with the auxiliary model data for this method, user can locally compute the approximation for each item factor, as mentioned above. As a consequence, is also able to locally compute the top- recommendations using these approximate item factors.

4. Experiments

We compared the performance of our SP2 framework with various baselines, as described next, under different settings on two real datasets, viz., MovieLens-100K (Harper and Konstan, 2015) data and a subset of Amazon Electronics (McAuley et al., 2015) data.

4.1. SP2 vs. Different Baselines

Absolute Optimistic (Everything public): Here, we assume that every user optimistically shares everything publicly without any privacy concern i.e. a single MF model is built on the entire training data itself. Theoretically, this should have the best performance, thus providing the overall upper bound.

Absolute Pessimistic (Everything private): Here, we assume that every user is pessimistic and does not share anything publicly due to privacy concerns. Thus separate models are built for each user based only on their individual ratings, which in practice, is as good as using the average rating for that user for all his/her predictions.

Only Public: This mimics the standard CF scenario, where privacy preserving mechanisms are absent. Consequently, the users only rate the items, which they are comfortable with sharing; they refrain from explicitly rating sensitive items. We build a single MF model using only the public ratings and ignore the private ratings completely.

Distributed aggregation: Shokri et al. (Shokri et al., 2009) proposed peer-to-peer based data obfuscation policies, which obscured the user ratings information before uploading it to a central server that eventually built the final recommendation model. The three obfuscation policies mentioned are: (1) Fixed Random (FR) Selection: A fixed set of ratings are randomly selected from other peers for obfuscation. (2) Similarity-based Random (SR) Selection: A peer randomly sends a fraction of its ratings to the user for obfuscation depending on its similarity (Pearson, cosine, etc.) with the user. (3) Similarity-based Minimum Rating (SM) Frequency Selection: This approach is similar to the previous one, except that instead of randomly selecting the ratings, higher preference is given to the ratings of those items that have been rated the least number of times.

Fully decentralized recommendation: Berkovsky et al. (Berkovsky et al., 2007) proposed a fully decentralized peer-to-peer based architecture, where each user requests rating for an item by exposing a part of his/her ratings to a few trusted peers. The peers obfuscate their profiles by generating fake ratings and then compute their profile similarities with the user. Finally, the user computes the rating prediction for the item based on the ratings received from the peers and the similarities between them.

Differential Privacy: McSherry et al. in (McSherry and Mironov, 2009)

masks the ratings matrix sufficiently by adding random noise, drawn from a normal distribution, to generate a noisy global average rating for each movie. These global averages are then used to generate

fictitious ratings to further obscure the ratings matrix. This method ensures that the final model obtained does not allow inference of the presence or absence of any user rating.

For all MF models, the hyper-parameters were initialized with default values from the Surprise package222

4.2. Private Ratings Allocation

We first provide the following two definitions, which are used later for the private ratings allocation:

User privacy ratio for a user is defined as the fraction of total ratings which are marked private by .

Item privacy ratio for an item is likewise defined as how many of the total users (which assigned a rating) have marked as private.

In order to examine the SP2 framework under two different hypotheses (stated in Section 2.3), we preprocess the datasets as discussed below:

H1. We generate user privacy ratios in the interval for all

users from a beta distribution

(Liu et al., 2017) with parameters . For each user with user privacy ratio , fraction of ’s ratings are randomly selected and marked as public, while the remainder of ’s ratings are considered private.

H2. Here, we generate item privacy ratios for all items from a beta distribution. For each item with item privacy ratio , fraction of ratings assigned to are randomly selected and marked as public, while the remainder of ’s ratings are considered private.

For all our empirical analysis, we considered the following four beta distributions, as shown in Figure 3.

Mostly Balanced : Most user/item privacy ratios are likely to be close to the theoretical mean value 0.5.

Mostly Extreme : Most users/items have either very high or very low privacy ratios. The overall average of the privacy ratios will be close to 0.5.

Mostly Pessimistic : Most users/items have very high privacy ratios.

Mostly Optimistic : Most users/items have very low privacy ratios.

Figure 3. Probability density functions of four different beta distributions used in private ratings allocation.

4.3. Results

We evaluate our SP2 framework using accuracy-based as well as ranking-based metric. The 5-fold average RMSE and NDCG@10 scores (Liu et al., 2016)

along with their corresponding standard deviations are reported in Table

2 for the MovieLens and Amazon Electronics datasets.

Category Method Model Parameters Movielens Amazon Electronics RMSE NDCG@10 RMSE NDCG@10 20cmPeer-to-peer
Shokri et al. (FR) #Peers 1.16240.00189 0.48730.0055 1.22160.01229 0.77570.00841
Shokri et al. (SR) #Peers 1.16240.00562 0.48910.00773 1.20480.00889 0.77740.00686 Shokri et al. (SM) #Peers 1.14470.00629 0.49220.0094 1.20280.00985 0.77480.00887 Berkovsky et al. #Peers 1.1320.00411 0.48760.00599 1.34050.00562 0.76190.00756 Diff. Privacy McSherry et al. 1.2010.00675 0.47950.00911 1.13490.00664 0.77190.00675 20cmExtreme
Abs. Pessimistic 0.96320.00489 0.41320.00661 0.97880.00368 0.73790.00535
Abs. Optimistic 0.89230.00576 0.54260.0072 0.95380.00955 0.7880.00818 20cmClassic
Only Public (H1) 0.91830.00725 0.5450.00726 0.9710.00516 0.78920.00334
0.9250.0075 0.54680.00688 0.97750.00455 0.7880.00204 0.95180.00822 0.53630.00727 0.99570.00763 0.77380.00233 0.90330.00641 0.55340.00179 0.960.00411 0.79550.00446 Only Public (H2) 0.92060.00328 0.53910.00179 0.96920.00895 0.7870.00463 0.92870.00228 0.5280.00596 0.9690.00931 0.78530.00242 0.95220.00213 0.5170.00797 0.98510.00808 0.77180.00517 0.90630.00294 0.54660.00212 0.95810.00946 0.79290.00456 20cmSelective
Naive (H1) 0.90510.00654 0.55580.00511 0.96130.00534 0.79910.00322
0.90720.00873 0.55420.00727 0.96410.00555 0.79780.0012 0.93160.0088 0.54440.00666 0.98080.00733 0.78680.00297 0.89530.00688 0.55940.00696 0.95260.00525 0.80480.00318 Naive (H2) 0.9070.00377 0.55140.00123 0.95890.00921 0.79770.00378 0.9140.00302 0.53830.00491 0.96030.00937 0.7930.00222 0.93160.00215 0.52740.00802 0.97050.00795 0.78240.00459 0.89460.0032 0.55320.00306 0.95170.00949 0.80340.00411 Clustering (H1) 0.91650.00766 0.54570.00695 0.9660.00565 0.78930.00353333 0.91460.01183 0.54940.00795 0.96950.00774 0.78760.00309 0.93870.00854 0.53660.00681 0.98470.00741 0.77360.00241 0.90370.00634 0.55380.00206 0.9580.0048 0.79450.00373 Clustering (H2) 0.91830.00395 0.54010.00172 0.96530.00924 0.78710.00464444, statistically insignificant 0.92490.00206 0.52870.00598 0.96510.00938 0.78520.00228 0.94050.00166 0.51740.00718 0.97570.00812 0.77160.00528 0.90470.00319 0.54730.00121 0.95660.00971 0.79260.00436 Joint Opt. (H1) 0.90510.00654 0.5560.00502 0.96120.00533 0.79890.00315 0.90720.00873 0.55370.00735 0.9640.00556 0.79750.00098 0.93160.0088 0.54470.00646 0.98080.00734 0.78690.00338 0.89530.00689 0.55920.00706 0.95260.00524 0.80450.00318 Joint Opt. (H2) 0.9070.00377 0.5510.00095 0.95890.0092 0.79780.00371 0.9140.00302 0.53830.00507 0.96020.00939 0.7930.0022 0.93190.00241 0.52780.00851 0.97050.00792 0.7820.00479 0.89470.0032 0.55330.00289 0.95170.0095 0.80340.00423
Table 2. Experimental results on MovieLens and Amazon Electronics datasets
(a) RMSE comparison
(b) NDCG comparison
Figure 4. Performance comparison among various baselines for different user privacy ratios on MovieLens dataset.
(a) Aux. data size comparison
(b) RMSE comparison
Figure 5. Comparing SP2 implementations for different number of clusters on Amazon Electronics dataset (preprocessed using mostly balanced beta distribution and H1).

As indicated by the results in Table 2, the peer-to-peer based techniques and the differential privacy method, which attempt to ensure complete user privacy from the central recommender system, end up performing worse than the standard only public baseline due to the data obfuscation policies. In addition, the fully decentralized approach in (Berkovsky et al., 2007) is not scalable due to the limited number of trusted peers. In the same vein, the distributed aggregation approaches in (Shokri et al., 2009) suffer from poor performance as the number of peers increases due to higher obfuscation; however, lowering the number of peers risks significant privacy breach by the central recommender system. Table 2 further summarizes that our joint optimization approach (with only top-3 cluster weights) performs as good as the naive approach. Our clustering approach for SP2 framework, performs worse than naive and joint optimization but is largely better than the only public

baseline across both evaluation metrics. Unless otherwise mentioned in the table,

-value for all results related to SP2 framework (computed using two-tailed test with respect to only public baseline) is less than . As evident from the table, our results hold across both the hypotheses. However, the performance of all the implementations improve as the privacy ratio reduces. This is further demonstrated through figures 3(a) and 3(b) which plot the RMSE and NDCG values respectively against varying average user privacy ratio across all users. Finally, figures 4(a) and 4(b) present an ablation study that studies the performance and communication cost for different SP2 frameworks with varying number of clusters. The naive method has the best performance but requires the largest auxiliary model data. The joint optimization technique require an order of magnitude less data than the naive one but can reach the same performance for an optimal number of clusters.

5. Survey

We conducted a survey555 to gauge public interest in using our SP2 framework. In total, 74 users responded, of which 74% were male and 24% were female. 92% of our respondents were within the age bracket . In our survey, we found that 57% of the participants do not rate items on any platform, whereas around 20% of the users provide a lot of ratings. About 48% of the respondents claim they hesitate to rate an item because they do not want to share their opinion publicly or because they do not trust the platform. The last two questions in our survey were aimed at estimating how likely a user is to provide a rating, if he/she can use our selective privacy preserving framework. When users were asked if they would rate more items privately on their device, if it guarantees to improve the quality of their recommendations, about 56% of the users responded affirmatively, while 22% said ‘maybe’ and 22% responded with a disagreement. The responses to this survey indicate that an overwhelming majority of users are willing to use our proposed selective privacy preserving framework in order to improve their recommendations as well as safeguard their private information.

6. Related Work

Privacy preserving recommender systems has been well explored in the literature. Peer-to-peer (P2P) techniques (Berkovsky et al., 2007) are largely meant to protect users from untrusted servers. However, they also require users to share their private information with peers, which is a privacy breach in itself. In addition, P2P architectures lack scalability due to limited number of trusted peers and are vulnerable to malicious interferences by rogue actors. Differential privacy methods (McSherry and Mironov, 2009) provide theoretical privacy guarantees for all users, but can also adversely impact the performance of the recommender systems due to data obfuscation.

The related literature also comprises of cryptology (Zhan et al., 2008) based techniques that approach the problem little differently. For example, Zhan et al.(Zhan et al., 2008) used “homomorphic encryption” to integrate multiple sources of encrypted user ratings in a privacy preserving manner. However, the extreme computation time and scalability issues associated with homomorphic encryption pose a serious practicality question (Naehrig et al., 2011), even for moderate size datasets.

Lastly, recent federated machine learning approaches (Konećný et al., 2016) have proposed privacy-preserving techniques to build machine learning models using secure aggregation protocol (Bonawitz et al., 2017). However, in case of CF algorithms, this would require a user to share an update (in encrypted form) performed on an item factor locally. In our case, this means that the server would be able to identify from the encrypted updates, which items the user had rated privately, even though the exact ratings remain unknown. This itself constitutes a serious privacy breach (Chow et al., 2012; Advokat, 1987; McCabe, 2012). Hence, in our SP2 framework, no private user information is ever uploaded or communicated.

7. Conclusion

In this paper, we proposed a novel selective privacy preserving (SP2) paradigm for CF based recommender systems that allows users to keep a portion of their ratings private, meanwhile delivering better recommendations, as compared to other privacy preserving techniques. We have demonstrated the efficacy of our approach under different configurations by comparing it against other baselines on two real datasets. Finally, our framework empowers users to define their own privacy policy by determining which ratings should be private and which ones should be public.


  • (1)
  • Advokat (1987) Stephen Advokat. 1987. Publication of Bork’s video rentals raises privacy issue. Chicago Tribune (1987).
  • Arthur and Vassilvitskii (2007) David Arthur and Sergei Vassilvitskii. 2007. K-means++: The Advantages of Careful Seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’07). Society for Industrial and Applied Mathematics, 1027–1035.
  • Berkovsky et al. (2007) Shlomo Berkovsky, Yaniv Eytani, Tsvi Kuflik, and Francesco Ricci. 2007. Enhancing Privacy and Preserving Accuracy of a Distributed Collaborative Filtering. In Proceedings of the 2007 ACM Conference on Recommender Systems (RecSys ’07). ACM, 9–16.
  • Bloom (1970) Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Commun. ACM 13, 7 (July 1970), 422–426.
  • Bonawitz et al. (2017) Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical Secure Aggregation for Privacy Preserving Machine Learning. Cryptology ePrint Archive, Report 2017/281. (2017).
  • Canny (2002) John Canny. 2002. Collaborative Filtering with Privacy via Factor Analysis. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’02). ACM, 238–245.
  • Chow et al. (2012) Richard Chow, Manas A. Pathak, and Cong Wang. 2012. A Practical System for Privacy-Preserving Collaborative Filtering. In 12th IEEE International Conference on Data Mining Workshops, ICDM Workshops, Brussels, Belgium, December 10, 2012. 547–554.
  • Das et al. (2017) Ariyam Das, Ishan Upadhyaya, Xiangrui Meng, and Ameet Talwalkar. 2017. Collaborative Filtering As a Case-Study for Model Parallelism on Bulk Synchronous Systems. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM ’17). ACM, 969–977.
  • Ge et al. (2016) Rong Ge, Jason D. Lee, and Tengyu Ma. 2016. Matrix Completion Has No Spurious Local Minimum. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., 2981–2989.
  • Grodzevich and Romanko (2006) Oleg Grodzevich and Oleksandr Romanko. 2006. Normalization and other topics in multi-objective optimization. In Proceedings of the Fields-MITACS Industrial Problems Workshop.
  • Harper and Konstan (2015) F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 5, 4, Article 19 (Dec. 2015), 19 pages.
  • Kanungo et al. (2002) Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. 2002. An Efficient k-Means Clustering Algorithm: Analysis and Implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24, 7 (July 2002), 881–892.
  • Karydi and Margaritis (2016) Efthalia Karydi and Konstantinos Margaritis. 2016. Parallel and Distributed Collaborative Filtering: A Survey. ACM Comput. Surv. 49, 2, Article 37 (Aug. 2016), 41 pages.
  • Keuper and Pfreundt (2015) Janis Keuper and Franz-Josef Pfreundt. 2015. Asynchronous Parallel Stochastic Gradient Descent: A Numeric Core for Scalable Distributed Machine Learning Algorithms. In Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments (MLHPC ’15). ACM, Article 1, 11 pages.
  • Konećný et al. (2016) Jakub Konećný, H. Brendan McMahan, Daniel Ramage, and Peter Richtarik. 2016. Federated Optimization: Distributed Machine Learning for On-Device Intelligence. (2016).
  • Koren et al. (2009) Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer 42, 8 (Aug. 2009), 30–37.
  • Lee and Seung (2000) Daniel D. Lee and H. Sebastian Seung. 2000. Algorithms for Non-negative Matrix Factorization. In Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS’00). MIT Press, 535–541.
  • Lee et al. (2016) Jason D. Lee, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2016. Gradient Descent Only Converges to Minimizers. In 29th Annual Conference on Learning Theory (Proceedings of Machine Learning Research), Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir (Eds.), Vol. 49. PMLR, 1246–1257.
  • Li et al. (2016) Mu Li, Ziqi Liu, Alexander J. Smola, and Yu-Xiang Wang. 2016. DiFacto: Distributed Factorization Machines. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM ’16). ACM, 377–386.
  • Lin (2007) Chih-Jen Lin. 2007. Projected Gradient Methods for Nonnegative Matrix Factorization. Neural Comput. 19, 10 (Oct. 2007), 2756–2779.
  • Liu et al. (2017) Yuan Liu, Usman Shittu Chitawa, Guibing Guo, Xingwei Wang, Zhenhua Tan, and Shuang Wang. 2017. A Reputation Model for Aggregating Ratings Based on Beta Distribution Function. In Proceedings of the 2nd International Conference on Crowd Science and Engineering (ICCSE’17). ACM, 77–81.
  • Liu et al. (2016) Yanchi Liu, Chuanren Liu, Bin Liu, Meng Qu, and Hui Xiong. 2016. Unified Point-of-Interest Recommendation with Temporal Interval Assessment. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). ACM, New York, NY, USA, 1015–1024.
  • Marais and Bharat (1997) Hannes Marais and Krishna Bharat. 1997. Supporting Cooperative and Personal Surfing with a Desktop Assistant. In Proceedings of the 10th Annual ACM Symposium on User Interface Software and Technology (UIST ’97). ACM, 129–138.
  • McAuley et al. (2015) Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-Based Recommendations on Styles and Substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’15). ACM, 43–52.
  • McCabe (2012) Kathryn E. McCabe. 2012. Just you and me and netflix makes three: Implications for allowing frictionless sharing of personally identifiable information under the video privacy protection act. J. Intell. Prop. L. (2012).
  • McSherry and Mironov (2009) Frank McSherry and Ilya Mironov. 2009. Differentially Private Recommender Systems: Building Privacy into the Netflix Prize Contenders. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’09). ACM, 627–636.
  • Meng et al. (2016) Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2016. MLlib: Machine Learning in Apache Spark. J. Mach. Learn. Res. 17, 1 (Jan. 2016), 1235–1241.
  • Mishra and Reddy (2016) Sonu K. Mishra and Manoj Reddy. 2016. A Bottom-up Approach to Job Recommendation System. In Proceedings of the Recommender Systems Challenge (RecSys Challenge ’16). ACM, Article 4, 4 pages.
  • Naehrig et al. (2011) Michael Naehrig, Kristin Lauter, and Vinod Vaikuntanathan. 2011. Can Homomorphic Encryption Be Practical?. In Proceedings of the 3rd ACM Workshop on Cloud Computing Security Workshop (CCSW ’11). ACM, New York, NY, USA, 113–124.
  • Narayanan and Shmatikov (2008) Arvind Narayanan and Vitaly Shmatikov. 2008. Robust De-anonymization of Large Sparse Datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (SP ’08). IEEE Computer Society, 111–125.
  • Oh et al. (2015) Jinoh Oh, Wook-Shin Han, Hwanjo Yu, and Xiaoqian Jiang. 2015. Fast and Robust Parallel SGD Matrix Factorization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15). ACM, 865–874.
  • Schelter et al. (2013) Sebastian Schelter, Christoph Boden, Martin Schenck, Alexander Alexandrov, and Volker Markl. 2013. Distributed Matrix Factorization with Mapreduce Using a Series of Broadcast-joins. In Proceedings of the 7th ACM Conference on Recommender Systems (RecSys ’13). ACM, 281–284.
  • Shokri et al. (2009) Reza Shokri, Pedram Pedarsani, George Theodorakopoulos, and Jean-Pierre Hubaux. 2009. Preserving Privacy in Collaborative Filtering Through Distributed Aggregation of Offline Profiles. In Proceedings of the Third ACM Conference on Recommender Systems (RecSys ’09). ACM, 157–164.
  • Takács et al. (2008) Gábor Takács, István Pilászy, Bottyán Németh, and Domonkos Tikk. 2008. Matrix Factorization and Neighbor Based Algorithms for the Netflix Prize Problem. In Proceedings of the 2008 ACM Conference on Recommender Systems (RecSys ’08). ACM, 267–274.
  • Takács and Tikk (2012) Gábor Takács and Domonkos Tikk. 2012. Alternating Least Squares for Personalized Ranking. In Proceedings of the Sixth ACM Conference on Recommender Systems (RecSys ’12). ACM, 83–90.
  • Volkovs et al. (2017) Maksims Volkovs, Guangwei Yu, and Tomi Poutanen. 2017. DropoutNet: Addressing Cold Start in Recommender Systems. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4964–4973.
  • Wu et al. (2016) Yao Wu, Xudong Liu, Min Xie, Martin Ester, and Qing Yang. 2016. CCCF: Improving Collaborative Filtering via Scalable User-Item Co-Clustering. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM ’16). ACM, 73–82.
  • Xing (2015) Eric P. et al. Xing. 2015. Petuum: A New Platform for Distributed Machine Learning on Big Data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15). ACM, 1335–1344.
  • Zhan et al. (2008) Justin Zhan, I-Cheng Wang, Chia-Lung Hsieh, Tsan-sheng Hsu, Churn-Jung Liau, and Da-Wei Wang. 2008. Towards Efficient Privacy-preserving Collaborative Recommender Systems. In The 2008 IEEE International Conference on Granular Computing, GrC 2008, Hangzhou, China, 26-28 August 2008. 778–783.