Federating Recommendations Using Differentially Private Prototypes

Machine learning methods allow us to make recommendations to users in applications across fields including entertainment, dating, and commerce, by exploiting similarities in users' interaction patterns. However, in domains that demand protection of personally sensitive data, such as medicine or banking, how can we learn such a model without accessing the sensitive data, and without inadvertently leaking private information? We propose a new federated approach to learning global and local private models for recommendation without collecting raw data, user statistics, or information about personal preferences. Our method produces a set of prototypes that allows us to infer global behavioral patterns, while providing differential privacy guarantees for users in any database of the system. By requiring only two rounds of communication, we both reduce the communication costs and avoid the excessive privacy loss associated with iterative procedures. We test our framework on synthetic data as well as real federated medical data and Movielens ratings data. We show local adaptation of the global model allows our method to outperform centralized matrix-factorization-based recommender system models, both in terms of accuracy of matrix reconstruction and in terms of relevance of the recommendations, while maintaining provable privacy guarantees. We also show that our method is more robust and is characterized by smaller variance than individual models learned by independent entities.


page 1

page 2

page 3

page 4


Practical Privacy Preserving POI Recommendation

Point-of-Interest (POI) recommendation has been extensively studied and ...

PrivateRec: Differentially Private Training and Serving for Federated News Recommendation

Privacy protection is an essential issue in personalized news recommenda...

Efficient differentially private learning improves drug sensitivity prediction

Users of a personalised recommendation system face a dilemma: recommenda...

ReuseKNN: Neighborhood Reuse for Privacy-Aware Recommendations

User-based KNN recommender systems (UserKNN) utilize the rating data of ...

Secure Federated Matrix Factorization

To protect user privacy and meet law regulations, federated (machine) le...

Private and Utility Enhanced Recommendations with Local Differential Privacy and Gaussian Mixture Model

Recommendation systems rely heavily on users behavioural and preferentia...

GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators

The wide-spread availability of rich data has fueled the growth of machi...

1 Introduction

Machine learning models exploit similarities in users’ interaction patterns to provide recommendations in applications across fields including entertainment (e.g., books, movies, and articles), dating, and commerce. Such recommendation models are typically trained using millions of data points on a single, central system, and are designed under the assumption that the central system has complete access to all the data. Further, they assume that accessing the model poses no privacy risk to individuals. In many settings, however, these assumptions do not hold. In particular, in domains such as healthcare, privacy requirements and regulations may preclude direct access to data. Moreover, models trained on such data can inadvertantly leak sensitive information about patients and clients. In addition to privacy concerns, when data is gathered in a distributed manner, centralized algorithms may lead to excessive memory usage and generally require significant communication resources.

As a concrete example, consider the use of recommender systems in the healthcare domain. There, recommender systems have been used in a variety of tasks including decision support (Duan et al., 2011), clinical risk stratification (Hassan and Syed, 2010) and automatic detection of omissions in medication lists (Hasan et al., 2008). Such systems are typically built using electronic health records (EHRs), which are subject to privacy constraints that limit the ability to share the data between hospitals. This restricts practical applications of recommender systems in healthcare settings as single hospitals typically do not have sufficient amounts of data to train insightful models. Even when training based on a single hospital’s data is possible, the resulting models will not capture distributional differences between hospitals, thus limiting their applicability to other hospitals.

Recently, federated learning (McMahan et al., 2017)

was proposed as an algorithmic framework for the settings where the data is distributed across many clients and due to practical constraints cannot be centralized. In federated learning, a shared server sends a global model to each client, who then update the model using their local data. The clients send statistics of the local models (for example, gradients) to the server. The server updates the shared model based on the received client information and broadcasts the updated model to the clients. This procedure is repeated until convergence. Federated learning has proved efficient in learning deep neural networks for image classification

(McMahan et al., 2017)

and text generation tasks

(Yang et al., 2018; Hard et al., 2018).

While federated methods address practical computing and communication concerns, privacy of the users in a federated system is potentially vulnerable. Although such systems do not share data directly, the model updates sent to the server may contain sufficient information to uncover model features and raw data information (Milli et al., 2019; Koh and Liang, 2017; Carlini et al., 2019; Hitaj et al., 2017), possibly leaking information about the users. These concerns motivate us to adopt differential privacy (Dwork et al., 2006) as a framework for limiting exposure of users’ data in federated systems. A differentially private mechanism is a randomized algorithm which allows us to bound the dependence of the output on a single data point. This, in turn, translates to bounds on the amount of additional information a malicious actor could infer about a single individual if that individual were included in the training set.

While the differential mechanism presents itself as a natural solution to privacy concerns of users in federated systems, combining the two paradigms faces some major challenges. The key ones emerge due to the differences in how the two frameworks function. On the one hand, federated learning algorithms are typically iterative and involve multiple querying of the individual entities to collect up-to-date information. On the other hand, in a differential privacy setting where the information obtained in each query must be privatized via injecting noise, the total amount of noise required to be added to a query scales linearly with the number of iterations (thus reducing utility of the system and the information content) Kairouz et al. (2019); McMahan et al. (2018).

In this paper, we present a novel differentially private federated recommendation framework for the setting where each user’s data is associated with one of many entities, e.g., hospitals, schools or banks. Each entity is tasked with maintaining the privacy of the data entrusted to it against possible attacks by malicious entities. An untrusted server is available to learn centralized models and communicate (in both directions) with the individual entities. Our method learns per-entity recommender models by sharing information between entities in a federated manner, without compromising users’ privacy or requiring excessive communication. Specifically, our method learns differentially private prototypes for each entity, and then uses those prototypes to learn global model parameters on a central server. These parameters are returned to the entities which use them to learn local recommender models without any further communication (and, therefore, without any additional privacy risk).

To our knowledge, the proposed framework is the first scheme that introduces differential privacy mechanisms to federated recommendations. Unlike typical federated learning algorithms, our method requires only two global communication steps. Such a succinct communication reduces the amount of noise required to ensure differential privacy while also reducing communication overhead and minimizing the risk of communication interception. Yet despite providing differential privacy guarantees to participating entities, the framework allows each entity to benefit from data held by other entities through building its own private, uniquely adapted model. Specific contributions of the paper can be summarized as follows:

  • We propose a federated recommendation framework for learning latent representation of products and services while bounding the privacy risk to the participating entities. This is accomplished by estimating the column space of an interaction matrix from differentially private prototypes via matrix factorization.

  • We enable federating recommendations under communication constraints by building in the requirement that the number of communication rounds between participating entities and the shared server is only two.

  • We demonstrate generalizable representations and strong predictive performance in benchmarking tests on synthetic and real-world data comparing the proposed framework with individual models and conventional federated schemes that lack privacy guarantees.

2 Background

2.1 Recommender Systems

The goal of recommender systems is to suggest new content to users. Recommender systems can be broadly classified in two categories: content filtering and collaborative filtering. Under the content-based paradigm, user and/or item profiles are constructed from demographic data or item information, respectively. For example, a user profile could include age while movies could be associated with genre, principal actors, etc. With this information, similarity between users or items can be computed and utilized for recommendation via, for example, clustering or nearest neighbors techniques

(Koren et al., 2009). Collaborative filtering (Goldberg et al., 1992) relies on past user behaviour (ratings, views, purchases) to make recommendations, avoiding the need for additional data collection (Herlocker et al., 1999; Koren, 2008). In this paper we focus on collaborative filtering, although our methodology could be extended to incorporate additional content-based information (Rendle, 2010). Below we introduce notation and summarize relevant techniques.

Consider a set of users and a set of items, where each user has interacted with a subset of the items. We assume that the interactions for user

can be summarized via a partially observed feedback vector

, and that all user-item interactions can be represented by a partially observed matrix . Entries can be in the form of explicit feedback, e.g. numerical ratings from 1 to 5, or implicit, such as binary values indicating that a user viewed or clicked on some content (Hu et al., 2008; Zhao et al., 2018; Jawaheer et al., 2010). The goal is to predict items that a user would like but has not previously interacted with (i.e., to predict which of the missing values in have high values).

2.1.1 Matrix Factorization

Matrix factorization is a popular and effective collaborative filtering approach used in many different fields to find low dimensional representation of users and items (Koren, 2008; Koren et al., 2009; Srebro and Salakhutdinov, 2010; McAuley and Leskovec, 2013).

A matrix factorization approach assumes that users and items can be characterized in a low dimensional space for some , i.e., that the partially observed matrix can be approximated by , where aggregates users’ representations, and collects items’ representations. In this paper, we rely on non-negative factorization, i.e., we constrain the estimates of and to be non-negative. Such a constraint often results in more interpretable latent factors and improved performance (Zhang et al., 2006; McAuley and Leskovec, 2013). In this setting, and can be estimated as


where is a regularizer. For the remainder of this paper, we assume .

Since we only have access to a subset of the entries of , (1) is solved by minimizing the error over the training set of ratings ,


where denotes the th row of – i.e., the latent representation for the th user – and is the th row of – i.e., the latent representation for the th item.

2.2 Federated learning

Federated learning was introduced by McMahan et al. (2017) as a framework for learning models in a decentralized manner, and originally applied to learning with neural networks. The goal of federated learning is to infer a global model without collecting raw data from participating users. This is achieved by having the users (or entities representing multiple users) locally compute model updates based on their data and share these updates with a central server. The server then updates the global model and sends it back to the users.

While they avoid directly sharing users’ data, most federated learning algorithms offer no formal guarantees that a malicious agent could not infer private information from the updates. For example, in a naïve application of the original federated learning method (McMahan et al., 2017) to a matrix-factorization-based recommender system, each entity shares parameters including a low-dimensional representation of each user, leading to a high risk of potential privacy breaches.

2.3 Differential Privacy

Differential privacy (Dwork et al., 2006) is a statistical notion of privacy that bounds the potential privacy loss an individual risks by allowing her data to be used in the algorithm.

Definition 2.1.

A randomized algorithm satisfies -differential privacy (-DP) if for any datasets and differing by only one record and any subset of outcomes ,

In other words, for any possible outcome, including any given individual in a data set can only change the probability of that outcome by at most a multiplicative constant which depends on

. Differential privacy has been applied to recommender systems by adding noise to the average item ratings and the item-item covariance matrix (McSherry and Mironov, 2009). However, this approach is designed for systems wherein a centralized server needs to collect all the data to derive users and items’ means and covariances. Differential privacy is more difficult to impose in iterative algorithms, such as those commonly used in federated learning scenarios, since the iterative nature of these algorithms requires splitting privacy budget across iterations, thus bringing forth technical challenges Abadi et al. (2016); Wu et al. (2017); McMahan et al. (2018).

2.3.1 Differentially private prototypes

Our design of private prototypes is motivated by the efficient differentially private

-means estimator for high-dimensional data introduced in

Balcan et al. (2017). This algorithm first relies on the Johnson-Lindenstrauss lemma to project the data into a lower dimensional space that still preserves much of the data’s underlying structure. Then, the space is recursively subdivided, with each subregion and its corresponding centroid being considered a candidate centroid with probability that depends on the number of points in the region and the desired value of privacy . The final -means are selected from the candidate centroids by recursively swapping out candidates using the exponential mechanism (McSherry and Talwar, 2007), where the score for each potential collection is the clustering loss. The selected candidates are mapped back to the original space by taking a noisy mean of data points in the corresponding cluster, providing -DP.

The complete algorithm is parametrized by the number of clusters ; the privacy budget ; a parameter such that with probability at least , the clustering loss associated with the centers satisfies

where OPT is the optimal loss under a non-private algorithm; is the number of data points; is the dimensionality of the data; and is the radius of a ball that bounds the data. We formalize this algorithm as private_prototypes in the supplementary material.

The Balcan et al. (2017) method is one of a number of differentially private algorithms for finding cluster representatives or prototypes. Blum et al. (2005) introduced SuLQ -means, where the server updating clusters’ centers receives only noisy averages. Unlike the approach of Balcan et al. (2017)

, this algorithm does not have guarantees on the convergence of the loss function.

Nissim et al. (2007) and Wang et al. (2015) use a similar framework but calibrate the noise by local sensitivity, which is difficult to estimate without assumptions on the dataset (Zhu et al., 2017). Private coresets have been used to construct differentially private -means and -medians estimators (Feldman et al., 2017), but this approach does not scale to large data sets.

3 Related Work

To the best of our knowledge, the current paper is the first to propose learning recommender systems in a federated manner while guaranteeing differential privacy. However, a number of other approaches have been proposed for incorporating notions of privacy in recommender systems (Friedman et al., 2015); we summarize the main ones below.

Cryptographic methods.

A number of private recommender systems have been developed using a cryptographic approach (Miller et al., 2004; Kobsa and Schreck, 2003). Such methods use encryption to protect users by encoding personal information with cryptographic functions before it is communicated. In the healthcare context, Hoens et al. (2013) have applied cryptographic methods to providing physician recommendations. However, these methods require centralizing the dataset to perform calculations on the protected data, which may be infeasible when the total data size is large or communication bandwidth is limited, or where regulations prohibit sharing of individuals’ data even under encryption.

Federated recommender systems.

In addition to the generic federated learning algorithms discussed in Sec 2.2, alternative federation methods have been proposed for matrix factorization, where the information being shared is less easily mapped back to individual users. Kim et al. (2017)

consider federated tensor factorization for computational phenotyping. There, the objective function is broken into subproblems using the Alternating Direction Method of Multipliers

(ADMM, Boyd et al., 2011), where the alternated optimization is utilized to distribute the optimization between different entities. User factors are learned locally, and the server updates the global factor matrix and sends it back to each entity. In a similar way, Ammad-ud-din et al. (2019) perform federated matrix factorization by taking advantage of alternating least squares. They decouple the optimization process, globally updating items’ factors and locally updating users’ factors. These two approaches converge to the same solution as non-federated methods. However, since current variables need to be shared at each optimization stage, this technique requires large communication rates and users’ synchronization. While either of the above factorization methods could be adapted to recommender systems, they lack strict privacy guarantees and require extensive communication.

Differential privacy.

In a recommender systems context, McSherry and Mironov (2009) rely on differential privacy results to obtain noisy statistics from ratings. Although the resulting model provides privacy guarantees, it requires access to the centralized raw data in order to estimate the appropriate statistics. This makes it unsuited for the data-distributed setting we consider.

Exploiting public data.

Xin and Jaakkola (2014) consider matrix factorization methods to learn (see Sec 2) in the setting where we can learn the item representation matrix from publicly available data. The public item matrix is then shared with private entities to locally estimate their latent factors matrix . The applicability of this approach is hindered by potentially limited access to public data, which is the case in sensitive applications such as healthcare recommendations. Our approach provides an alternative method for learning a shared estimate of from appropriately obscured private data.

4 A Differentially Private, Federated Recommender System

We propose a model for learning a recommender system in a federated setting where different entities possess a different number of records. We assume the data is split between entities such that each entity possesses data for more than one user. The partially observed user-item interaction matrix associated with the th entity is denoted by .

We assume that the training data is sensitive and should not be shared outside the entity to which it belongs. While each entity will need to communicate information to a non-private server, we wish to ensure this communication does not leak sensitive information.

In order for differential privacy and federated recommender systems to work in concert, our framework must accomplish two objectives: 1) make recommendations privately by injecting noise in a principled way, and 2) reduce the number of communications to minimize the amount of injected noise. The solutions to these requirements are interrelated. We first describe a method that reduces the number of communication steps to two, and then procede to describe how to solve the privacy challenge.

4.1 A One-shot Private System

Most federated learning methods require multiple rounds of communication between entities and a central server, which poses a problem for differential privacy requirements. Specifically, we can think of each round of communication from the entities to the server as a query sent to the individual entities, which has potential to leak information. If we query an -DP mechanism times, then the sequence of queries is only -DP (McSherry, 2009). In practice, this means that, the more communication we require, the more noise must be added to maintain the same level of differential privacy.

To minimize the amount of noise a differential privacy technique will introduce, our method must limit the number of communication calls between the entities. In the context of matrix factorization-based recommendations which involve estimating , as discussed in Sec 3, Xin and Jaakkola (2014) show that transmission of private data can be avoided by using a public dataset to learn the shared item representation . Given , each entity can privately estimate without releasing any information about . Building upon this idea, we constrain the communication to only two rounds, back and forth. However, in our problem setting we do not have access to a public data set. Instead, we construct a shared item representation based on privatized prototypes collected from each entity. These prototypes are designed to: a) contain similar information as , thus allowing construction of an accurate item representation; b) be of low dimension relative to , hence minimizing communication load; and c) maintain differential privacy with respect to the individual users. We elaborate on building prototypes in Sec 4.2.

Once we have generated prototypes for each entity, we send them to a centralized server that learns shared item representation through traditional matrix factorization (see Sec 2). This shared matrix is then communicated back to the individual entities which use it to learn their own users’ profile matrices and make local predictions.

In contrast to iterative methods, the proposed approach requires only two rounds of communication: one from the entities to the server, and one from the server to the entities. In addition to reducing communication costs and removing the need for synchronization, this strategy allows us to conserve the privacy budget. With only one communication requiring privatization, we are able to minimize the noise that must be added to guarantee a desired level of privacy.

4.2 Learning Prototypes

For our algorithm to succeed, we must find a way to share the information from all entities in order to build a global item representation matrix . We want the prototypes to be representative of the data set, i.e., ensure they convey useful information. Note that to satisfy -differential privacy, each set of prototypes must be -differentially private.

Differentially private dataset synthesis methods (see Bowen and Liu (2016) for a survey) could be used to generate having statistical properties similar to . However, these methods tend to be ill-suited for high-dimensional settings and would involve sending a large amount of data to the server. Instead, we consider methods that find differentially private prototypes of our dataset, with the aim of obtaining fewer samples that still capture much of the variation present in the individual data. Since we will use these prototypes to capture low-rank structure, provided each entity sends the number of prototypes larger than the rank, it is possible for such prototypes to contain the information required to recover singular vectors of yet still be smaller than , thus reducing the amount of information that needs to be communicated. When selecting the prototype mechanism, we recall the following two observations.



Remark 1.

Non-negative matrix factorization (NMF) and spectral clustering have been shown to be equivalent

(Ding et al., 2005).

Remark 2.

(Theorem 3 in Pollard (1982)) Let be the optimium of the -means objective on a dataset distributed according to some distribution on , and let be the set of discrete distributions on with support size at most . Then, the discrete distribution implied by is the closest discrete distribution to in with respect to the 2-Wasserstein metric.

Since we are learning the item matrix via NMF, Remark 1 suggests that one should capture the centroids of clusters in to preserve spectral information. Remark 2 implies that the prototypes obtained via -means are close, in a distributional sense, to the underlying distribution. Following these intuitions, we consider prototype generation methods based on -means. Since the learned prototypes are created to capture the same latent representation that would be captured by NMF, we expect the estimated item matrix to be close to the true .

Due to being appropriate for high-dimensional data, we adopt the framework of the differentially private candidates algorithm of Balcan et al. (2017). Note that this algorithm initially maps the data onto a low-dimensional space; however, since we are using the prototypes to learn a low-dimensional representation, such a mapping is unlikely to adversely impact the accuracy of the proposed method. We augment the scheme of Balcan et al. (2017) – while maintaining accuracy and privacy guarantees – by a novel recovery algorithm. The algorithm increases overall efficiency by exploiting the sparsity of the data and the Gumbel trick, often used to efficiently sample from discrete distributions Papandreou and Yuille (2011); Balog et al. (2017); Durfee and Rogers (2019).

After obtaining cluster assignments for each datapoint, instead of sequentially applying the exponential mechanism to recover non-zero entries on the centroid, we add noise drawn from a Gumbel distribution to the centroid mean and take the top- entries, where denotes the number of non-zero entries in the dataset. We formalize this procedure as Algorithm LABEL:alg:sparserec. Algorithm LABEL:alg:fedRecSys summarizes the proposed private, federated recommender system.

Theorem 4.1.

Algorithm LABEL:alg:fedRecSys is -Differentially Private.


The server interacts with the private datasets only once, when collecting the private prototypes. Durfee and Rogers (2019) prove that adding noise to the utility function , and selecting the top values from the noisy utility, is equivalent to applying the exponential mechanism times; therefore, transmission of a single prototype is -DP. The parallel composition theorem (McSherry, 2009) establishes that the overall privacy budget is given by the maximum of the individual budgets, implying that the overall algorithm is -DP. ∎

5 Experiments

We first test the performance of the proposed differentially-private federated recommender system on synthetic data and report the results in Sec 5.3. Then, to demonstrate the ability to provide high-quality recommendations in realistic settings, in Sec 5.4 we apply the system to real-world datasets.

For all the experiments, we fixed the level of regularization to since we did not observe notable difference in performance when varying from to .

5.1 Datasets

We test the proposed scheme on three different datasets. The first one is a synthetic dataset intended to simulate discrete processes such as ratings or counting event occurrences. The relevant matrices are generated as , , and . We set , , , and distribute the data uniformly across 10 different entities.

The second dataset is from the eICU Collaborative Research Database (Pollard et al., 2018), which contains data collected from critical care units throughout the continental United States in 2014 and 2015. Since different visits can have diverse causes and diagnoses, we count each patient visit as a separate observation. We use the laboratories and medicines tables from the database, and create a 2-way table where each row represents a patient and each column either a lab or medicine. Matrix is composed using data from over k patients, laboratories and medications, and hospitals. Each entry represents how many times a patient took a test or a medication; the goal is to recommend treatments.

Finally, we consider the Movielens 1M dataset, containing 1,000,209 anonymous ratings from 6,040 MovieLens users on approximately 3,900 movies. We use the first digit of each user’s ZIP code to set up a natural federation of the data.

5.2 Evaluation metrics

To assess convergence and perform parameter tuning, we use the Root Mean Squared Error (RMSE) between the real and the reconstructed . In the case of the synthetic data, RMSE is a suitable measure to examine the fit quality since we have access to the ground truth.

Additionally, to evaluate the quality of recommendations in the hospital and movie data tests, we compare the real and predicted rankings over the test samples using Mean Average Ranking Hu et al. (2008). Concretely, let be the percentile of the predicted ranking of item for user , where 0 means very highly ranked and above all other items. We calculate on a test set defined as


This measure compares the similarity between the real and predicted ranks. Intuitively, for a random ranking the expected is , so means a ranking no better than random. Conversely, lower values indicate highly ranked recommendations matching the users’ patterns.

5.3 Evaluating the impact of federation and privacy on synthetic data

Recall that our algorithm differs from the standard matrix factorization schemes in two key aspects: first, it learns the item matrix using prototypes, rather than the actual data; second, it learns the users’ sub-matrices independently given , rather than jointly. Moreover,insteat of learning the prototypes using exact -means, to ensure differential privacy we use an -DP algorithm. Here we explore the effect of these algorithmic features.

In particular, we compare our framework with the following algorithms:

  • Matrix factorization: Apply Eq 2 until convergence on .

  • MF + -means: Apply Eq 2 to factorize a matrix of exemplars , where collects the -means from each matrix . Use the estimate to learn individual matrices from .

  • MF + -random: Identical to MF + -means, but instead of using the cluster means, use random samples from .

  • MF + -private prototypes: Identical to MF + -means, but instead of using true cluster means, use the generated -DP prototypes 111See Algorithm private_prototypes in the supplementary. .

(a) RMSE vs. latent factors for non-private methods.
(b) RMSE vs. for non-private methods.
(c) RMSE vs. latent factors for private methods.
(d) RMSE vs. for private methods.
Figure 1: Results on synthetic data
(a) Training set RMSE
(b) Test set RMSE
(c) (Eq 3)
Figure 2: Comparison of three different methods on the eICU dataset.

We first evaluate how -means performs in a non-private setting. Figures 0(a) and 0(b) show the RMSE when and are fixed, respectively.222In the supplementary, we provide additional experiments varying between 10 and 300, and , the dimension of the latent space, between 20 and 80l the behavior is similar. In both figures, we see unsurprisingly that MF has the lowest RMSE, with -random exemplars from the original dataset performing second best. For larger values of in Fig 0(b), -means performance deteriorates compared to -random. Based on examination of the centroids, this is most likely due to

-means overfitting to outliers for larger values of

while -random performance improves as its number of exemplars approaches the full . We note that our synthetic data does not contain any clusters, so this is the worst-case scenario for the -means setting; even so, we observe that the difference in reconstructive performance between the three methods is fairly small. None of the above methods guarantee privacy.

Next, we compare the performance of private -means and non-private -means. In Fig 0(c), we consider a relatively small value of , and investigate the effect of as the number of latent factors changes. As expected, larger values of (i.e., less private settings) yield better results. Here we observe little difference in the performance between the private and non-private algorithms. However, in Fig 0(d) we see that for large , the private methods perform significantly better than the non-private -means, mirroring the results in Fig 0(b). We hypothesise that the noise introduced in the private and random scenarios acts as a regularizer, helping avoid overfitting. We note that, since the sensitivity of the random exemplar mechanism is equal to the range of the data, directly privatizing random exemplars would add excessive noise.

In both Fig 0(c) and Fig 0(d), we find that decreasing (and therefore increasing privacy) does not have a significant negative effect on the reconstruction quality. In Fig 0(d), for larger values , MF + private -means performs equally well, even for the smallest value of , as the noise is averaged over a large number of samples. Here, we can guarantee -DP instead of -DP with a minimal drop in RMSE.

5.4 Evaluating the federated recommender system

To evaluate the entire system, we assess our model on real-world data from the eICU dataset and the Movielens 1M dataset. Similar to the experiments in the previous section, we assume that each entity extracts exemplars via private prototypes and sends them to the server. The server learns the item matrix and sends it back to the entities. Each entity learns its own user matrix and reconstructs . We construct a test set by randomly selecting 20% of the users; for each selected user, we randomly select five entries.

We compare our private federated recommender system with: 1) non-private centralized matrix factorization, and 2) individual centralized matrix factorization for each hospital. The comparison is performed using different numbers of latent factors varied between 10 and 50. For the private prototypes, we fix for the hospitals’ data and for Movielens; in both cases, .

The results on two datasets are similar and thus we present Movielens results in the supplementary. Fig 1(a) and Fig 1(b) show the average reconstruction error over training and test data, respectively. As expected, on the training set we see lower RMSE by the individual models than when using a jointly learned model, since there are times as many parameters to model the overall variation. Perhaps surprisingly given the noise introduced via the differential privacy mechanism, the federated model achieves training set RMSE comparable to that achieved by individual models.

Analysis of the test set RMSE (Fig 1(b)) reveals the benefit of the federated model. The individual models obtain RMSE comparable to the jointly learned model, indicating that the low training set RMSE results from the individual models overfitting. The federated model, however, generalizes well to the test set. We hypothesise that this is because the jointly learned item matrix aids in generalization, and the use of noisy prototypes discourages overfitting.

Fig 1(c) shows the average ranking quality for the three methods. Consistent with the test set RMSE, the federated model obtains the best ranking performance. As intended, the federated model allows each hospital to improve its predictions by obtaining relevant information from other hospitals, without compromising its patients’ information.

6 Conclusion

We propose a novel, efficient framework to learn recommender systems in federated settings. Our framework enables entities to collaborate and learn common patterns without compromising users’ privacy, while requiring minimal communication. Our method assumes individuals are grouped into entities, at least some of which are large enough to learn informative prototypes; we do not require privacy within an entity.

A future direction could be to extend this approach to the more extreme scenarios where each entity represents a single individual. This would be useful for commerce or content sites where each user wants to maintain privacy. Another avenue for future work is to investigate error bounds for the reconstructed matrix. Such results could allow entities to determine an appropriate privacy budget while still learning useful models.


  • M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang (2016) Deep learning with differential privacy. In SIGSAC, pp. 308–318. Cited by: §2.3.
  • M. Ammad-ud-din, E. Ivannikova, S. A. Khan, W. Oyomno, Q. Fu, K. E. Tan, and A. Flanagan (2019) Federated collaborative filtering for privacy-preserving personalized recommendation system. arXiv preprint arXiv:1901.09888. Cited by: §3.
  • M. Balcan, T. Dick, Y. Liang, W. Mou, and H. Zhang (2017) Differentially private clustering in high-dimensional Euclidean spaces. In ICML, pp. 322–331. Cited by: §B.1, §2.3.1, §2.3.1, §4.2.
  • M. Balog, N. Tripuraneni, Z. Ghahramani, and A. Weller (2017) Lost relatives of the Gumbel trick. In ICML, pp. 371–379. Cited by: §4.2.
  • A. Blum, C. Dwork, F. McSherry, and K. Nissim (2005) Practical privacy: the SuLQ framework. In PODS, pp. 128–138. Cited by: §2.3.1.
  • C. M. Bowen and F. Liu (2016) Comparative study of differentially private data synthesis methods. arXiv preprint arXiv:1602.01063. Cited by: §4.2.
  • S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. FTML 3 (1), pp. 1–122. Cited by: §3.
  • N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song (2019) The secret sharer: evaluating and testing unintended memorization in neural networks. In USENIX Security Symposium, pp. 267–284. Cited by: §1.
  • C. Ding, X. He, and H. D. Simon (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In ICDM, pp. 606–610. Cited by: Remark 1.
  • L. Duan, W. N. Street, and E. Xu (2011) Healthcare information systems: data mining methods in the creation of a clinical recommender system. Enterprise Information Systems 5 (2), pp. 169–181. Cited by: §1.
  • D. Durfee and R. M. Rogers (2019) Practical differentially private top-k selection with pay-what-you-get composition. In NeurIPS, pp. 3527–3537. Cited by: §4.2, §4.2.
  • C. Dwork, F. McSherry, K. Nissim, and A. Smith (2006) Calibrating noise to sensitivity in private data analysis. In TCC, pp. 265–284. Cited by: §1, §2.3.
  • D. Feldman, C. Xiang, R. Zhu, and D. Rus (2017)

    Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks

    In ISPN, pp. 3–16. Cited by: §2.3.1.
  • A. Friedman, B. P. Knijnenburg, K. Vanhecke, L. Martens, and S. Berkovsky (2015) Privacy aspects of recommender systems. In Recommender Systems Handbook, pp. 649–688. Cited by: §3.
  • D. Goldberg, D. Nichols, B. M. Oki, and D. Terry (1992) Using collaborative filtering to weave an information tapestry. Communications of the ACM 35 (12), pp. 61–71. Cited by: §2.1.
  • A. Hard, K. Rao, R. Mathews, F. Beaufays, S. Augenstein, H. Eichner, C. Kiddon, and D. Ramage (2018) Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604. Cited by: §1.
  • S. Hasan, G. T. Duncan, D. B. Neill, and R. Padman (2008) Towards a collaborative filtering approach to medication reconciliation. In AMIA Annual Symposium, Vol. 2008, pp. 288. Cited by: §1.
  • S. Hassan and Z. Syed (2010) From Netflix to heart attacks: collaborative filtering in medical datasets. In IHI, pp. 128–134. Cited by: §1.
  • J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl (1999) An algorithmic framework for performing collaborative filtering. In sigir, pp. 230–237. Cited by: §2.1.
  • B. Hitaj, G. Ateniese, and F. Perez-Cruz (2017) Deep models under the gan: information leakage from collaborative deep learning. In CCS, pp. 603–618. Cited by: §1.
  • T. R. Hoens, M. Blanton, A. Steele, and N. V. Chawla (2013) Reliable medical recommendation systems with patient privacy. TIST 4 (4), pp. 67. Cited by: §3.
  • Y. Hu, Y. Koren, and C. Volinsky (2008) Collaborative filtering for implicit feedback datasets. In ICDM, pp. 263–272. Cited by: §2.1, §5.2.
  • G. Jawaheer, M. Szomszor, and P. Kostkova (2010) Comparison of implicit and explicit feedback from an online music recommendation service. In HetRec, pp. 47–51. Cited by: §2.1.
  • P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al. (2019) Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977. Cited by: §1.
  • Y. Kim, J. Sun, H. Yu, and X. Jiang (2017) Federated tensor factorization for computational phenotyping. In SIGKDD, pp. 887–895. Cited by: §3.
  • A. Kobsa and J. Schreck (2003) Privacy through pseudonymity in user-adaptive systems. TOIT 3 (2), pp. 149–183. Cited by: §3.
  • P. W. Koh and P. Liang (2017) Understanding black-box predictions via influence functions. In ICML, pp. 1885–1894. Cited by: §1.
  • Y. Koren, R. Bell, and C. Volinsky (2009) Matrix factorization techniques for recommender systems. Computer (8), pp. 30–37. Cited by: §2.1.1, §2.1.
  • Y. Koren (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In SIGKDD, pp. 426–434. Cited by: §2.1.1, §2.1.
  • J. McAuley and J. Leskovec (2013) Hidden factors and hidden topics: understanding rating dimensions with review text. In RecSys, pp. 165–172. Cited by: §2.1.1, §2.1.1.
  • H. B. McMahan, G. Andrew, U. Erlingsson, S. Chien, I. Mironov, N. Papernot, and P. Kairouz (2018) A general approach to adding differential privacy to iterative training procedures. arXiv preprint arXiv:1812.06210. Cited by: §1, §2.3.
  • H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2017) Communication-efficient learning of deep networks from decentralized data. In AISTATS, Cited by: §1, §2.2, §2.2.
  • F. D. McSherry (2009) Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD, pp. 19–30. Cited by: §4.1, §4.2.
  • F. McSherry and I. Mironov (2009) Differentially private recommender systems: building privacy into the netflix prize contenders. In SIGKDD, pp. 627–636. Cited by: §2.3, §3.
  • F. McSherry and K. Talwar (2007) Mechanism design via differential privacy. In FOCS, pp. 94–103. Cited by: §2.3.1.
  • B. N. Miller, J. A. Konstan, and J. Riedl (2004) PocketLens: toward a personal recommender system. TOIS 22 (3), pp. 437–476. Cited by: §3.
  • S. Milli, L. Schmidt, A. D. Dragan, and M. Hardt (2019) Model reconstruction from model explanations. In FAT*, Cited by: §1.
  • K. Nissim, S. Raskhodnikova, and A. Smith (2007) Smooth sensitivity and sampling in private data analysis. In STOC, pp. 75–84. Cited by: §2.3.1.
  • G. Papandreou and A. L. Yuille (2011) Perturb-and-map random fields: using discrete optimization to learn and sample from energy models. In ICCV, pp. 193–200. Cited by: §4.2.
  • D. Pollard (1982) Quantization and the method of k-means. IEEE Transactions on Information theory 28 (2), pp. 199–205. Cited by: Remark 2.
  • T. J. Pollard, A. E. Johnson, J. D. Raffa, L. A. Celi, R. G. Mark, and O. Badawi (2018) The eicu collaborative research database, a freely available multi-center database for critical care research. Scientific data 5. Cited by: §5.1.
  • S. Rendle (2010) Factorization machines. In ICDM, pp. 995–1000. Cited by: §2.1.
  • N. Srebro and R. R. Salakhutdinov (2010) Collaborative filtering in a non-uniform world: learning with the weighted trace norm. In NeurIPS, pp. 2056–2064. Cited by: §2.1.1.
  • Y. Wang, Y. Wang, and A. Singh (2015) Differentially private subspace clustering. In NeurIPS, pp. 1000–1008. Cited by: §2.3.1.
  • X. Wu, F. Li, A. Kumar, K. Chaudhuri, S. Jha, and J. Naughton (2017)

    Bolt-on differential privacy for scalable stochastic gradient descent-based analytics

    In SIGMOD, pp. 1307–1322. Cited by: §2.3.
  • Y. Xin and T. Jaakkola (2014) Controlling privacy in recommender systems. In NeurIPS, pp. 2618–2626. Cited by: §3, §4.1.
  • T. Yang, G. Andrew, H. Eichner, H. Sun, W. Li, N. Kong, D. Ramage, and F. Beaufays (2018) Applied federated learning: improving google keyboard query suggestions. arXiv preprint arXiv:1812.02903. Cited by: §1.
  • S. Zhang, W. Wang, J. Ford, and F. Makedon (2006) Learning from incomplete ratings using non-negative matrix factorization. In ICDM, pp. 549–553. Cited by: §2.1.1.
  • Q. Zhao, F. M. Harper, G. Adomavicius, and J. A. Konstan (2018) Explicit or implicit feedback? engagement or satisfaction?: a field experiment on machine-learning-based recommender systems. In SAC, Cited by: §2.1.
  • T. Zhu, G. Li, W. Zhou, and P. S. Yu (2017) Differentially private data publishing and analysis: a survey. TKDE 29 (8), pp. 1619–1638. Cited by: §2.3.1.

Appendix A Private -means Definitions and Subroutines

a.1 Differential Privacy Definitions

Definition A.1.

Let be a utility function where measures the utility of outputting given a dataset . The exponential mechanism outputs with probability proportional to , where is the sensitivity of defined by

Definition A.2.

A random variable

follows a Gumbel distribution with parameter if its PDF is given by .

a.2 Subroutines





Appendix B Experiments Details

b.1 Experiment 1: Private -means vs -means on Poisson Distributed Data

For this experiment we generated , , and . We set , , and observe average behaviour of private -means. As increases, the level of privacy decreases thus reducing -means objective and approaching the objective achieved by standard, non-private k-means.

Figure 3: -means objective vs. level of privacy. As decreases, private -means approaches the objective of non-private -means.

To implement standard -means we used the Python library scikit-learn. For private prototypes, we modified and implemented in Python publicly available MATLAB code from Balcan et al. (2017) (https://github.com/mouwenlong/dp-clustering-icml17).

(a) -means loss vs. for different values of .
(b) -means loss for different values of .
Figure 4: Private -means on synthetic data. Larger values of , i.e. less privacy, decrease the loss value. A large does not necessarily result in better performance. As shown in subfigure , for larger values of means, the private -means algorithm repeats centers instead of overfitting, and objective minimization is stalled.

b.2 Further Experimentation on the Number of Entities

Fig 5 shows the RMSE on the synthetic test dataset described in section 5.1. We observe that as the number of entities increases, the convergence improves. This is expected since the number of observations used to approximate V also grows.

Figure 5: Convergence of matrix factorization for different number of entities

b.3 Experiment 2: Varying Parameters for Normal Synthetic Data

Figure 6: Comparison of different prototype methods. As and increase, -random exemplars and private -means maintain competitive performance.
Figure 7: Comparison of various methods for different values of . Private methods have superior performance for large .

In Section 5.3 we showed results for fixed values of the number of prototypes and the number of latent features . Below we show additional plots for different values of those parameters.

In Fig 6 we observe as the number of samples increases, random -exemplars outperforms -means for all values of . Note that private -means performs well over a wide range of . As increases, private -means converge to the same value for various values of . Fig 7 compares all methods for different values of . The difference in RMSE is clearer for small values of . For large values of , the performance of -random and -private approaches that of matrix factorization.

b.4 Movielens Results

(a) averaged RMSE on train data for the Movielens 1M dataset.
(b) averaged RMSE on test data for the Movielens 1M dataset.
Figure 8:
Figure 9: Average rank on the Movielens 1M dataset. Privacy deteriorates performance, however DP-prototypes allow entities to collaborate and improve recommendations.

Similar to experiments on the eICU dataset, we observe unsurprisingly in Fig 8 that the global non-private matrix factorization model has lower RSME than the distributed approaches (i.e., individual models and Private 50-means). However, there is a benefit from collaboration. Recall that an average ranking above 0.5 means a ranking no better than random. Conversely, lower values indicate highly ranked recommendations matching the users’ patterns. We observe in Fig 8 the benefit of collaboration: the quality of recommendations is better for the prototypes models than the local individual models. With a small privacy budget, our method is able to share insights among entities, without sacrificing their privacy, and delivering better recommendations.